cellannotation / cell-annotation-schema

General, open-standard schema for cell annotations
9 stars 1 forks source link

Alignment with CAP file planning spec #43

Closed dosumis closed 1 week ago

dosumis commented 8 months ago
evanbiederstedt commented 7 months ago

RE: "labelset"

This is simply a historic term used by CAP developers; it comes up in discussion still.

This should be replaced by "cell annotation set"

dosumis commented 7 months ago

cellannotation_setname - caused lots of confusion - both in discussion with devs and with BICAN. You approved changing it to labelset some time ago. Please don't approve changes if you don't agree with them.

dosumis commented 7 months ago

In the current schema - labelset = name of obs key (I preferred annotation key but that was vetoed). annotations = key under which annotations live.

dosumis commented 7 months ago

One of the big advantages of going through CAS is that name changes happen in one place and can be propagated down to other representations. - see cellannotation/cas-tools#6

dosumis commented 7 months ago

@mfutey @evanbiederstedt - table on alignment CAS and CAP_encoding_for_anndata.md. Need to discuss how we align.

CAP Encoding for AnnData file CAS (only noting where names differ or are missing from CAS) Notes
obs    
Cell Annotation Metadata annotations Change in PR approved by Evan & Mary
[cellannotation_setname] labelset This was changed in a PR approved by Evan & Mary.  cellannotation_setname was unniversal causing confusion
[cellannotation_setname]  cell_label
[cellannotation_setname]--cell_fullname    
[cellannotation_setname]--cell_ontology_exists    
[cellannotation_setname]--cell_ontology_term_id    
[cellannotation_setname]--cell_ontology_term    
[cellannotation_setname]--rationale    
[cellannotation_setname]--rationale_dois    
[cellannotation_setname]--marker_gene_evidence    
[cellannotation_setname]--canonical_marker_genes    
[cellannotation_setname]--synonyms    
[cellannotation_setname]--category_fullname    
[cellannotation_setname]--category_cell_ontology_exists    
[cellannotation_setname]--category_cell_ontology_term_id    
[cellannotation_setname]--category_cell_ontology_term    
[cellannotation_setname]--cell_ontology_assessment    
**uns*** (Dataset metadata)    
cellannotation_schema_version    
cap_publication_timestamp cellannotation_timestamp Do we need CAP prefix on field name?
cap_publication_version version Do we need CAP prefix on field name?
dataset_title    
dataset_description    
dataset_url matrix_file_id Not quite the same thing.  Need to discuss
cap_publication_title    
cap_publication_description    
cap_publication_authors_list   Not yet in CAS?  Make this generic - other authors? See #41
cap_publication_url cellannotation_url Might not be the same thing?
cap_author_name author_name change to primary author name? - see #41
cap_author_contact author_contact  
cap_author_orcid orcid Make this primary_author_orcid?
cellannotation_metadata    
[cellannotation_setname]--metadata labelsets Change in PR approved by Evan & Mary
cellannotation_setdescription description Why the odd prefix? Is this meant to be prefixed with cellannotation_setname for flattening purposes?
annotation_method   Is this meant to be prefixed with cellannotation_setname for flattening purposes?
algorithm_name   Is this meant to be prefixed with cellannotation_setname for flattening purposes?
algorithm_version   Is this meant to be prefixed with cellannotation_setname for flattening purposes?
algorithm_repo_url   Is this meant to be prefixed with cellannotation_setname for flattening purposes?
reference_location   Is this meant to be prefixed with cellannotation_setname for flattening purposes?
reference_description . not in CAS
dosumis commented 6 months ago

@evanbiederstedt We reviewed before Xmas, but Mary wanted your input on this before making decisions.

evanbiederstedt commented 5 months ago

@dosumis

I believe @rm1113 removed all CAP_ prefixes as this caused confusion. I agree that we don't need cap_ everywhere. It doesn't help with community standards.

Comments

• RE: dataset_url This is the URL which CAP mints. It's a data portal, so we can mint these ourselves.

I doubt this needs to be associated with CAS.

RE: Is this meant to be prefixed with cellannotation_setname for flattening purposes?

No, check the AnnData uns file. https://github.com/cellannotation/cell-annotation-schema/blob/main/cap_anndata_schema.md#uns-dataset-metadata

rm1113 commented 5 months ago

@evanbiederstedt @dosumis

We still have cap_ prefixes for the publication related uns fields (see string names only):

    workspace_title = "cap_publication_title"
    workspace_description = "cap_publication_description"
    workspace_url = "cap_publication_url"
    authors_list = "cap_publication_authors_list"
    publication_timestamp = "cap_publication_timestamp"
    publication_version = "cap_publication_version"
    main_author = "cap_author_name"
    main_author_orcid = "cap_author_orcid"
    main_author_contact = "cap_author_contact"

My vision here that these fields don't make a lot of sense outside of the CAP and we don't really need to allign it with CAS. Please comment if you disagree about any field from the list. I won't protest against removing prefixes if it make sence.

rm1113 commented 5 months ago

@mfutey @evanbiederstedt - table on alignment CAS and CAP_encoding_for_anndata.md. Need to discuss how we align.

@dosumis we just released the v.1.0.0 of the CAP AnnData schema with following changes: dataset_title -> title dataset_description -> description cellannotation_setdescription -> description

dosumis commented 5 months ago

HI @rm1113

My preference would be to keep only these as CAP:

    workspace_title = "cap_publication_title"
    workspace_description = "cap_publication_description"
    workspace_url = "cap_publication_url"

And make the rest generic. These are all useful outside of CAP:

    authors_list = "cap_publication_authors_list"
    publication_timestamp = "cap_publication_timestamp"
    publication_version = "cap_publication_version"
    main_author = "cap_author_name"
    main_author_orcid = "cap_author_orcid"
    main_author_contact = "cap_author_contact"
evanbiederstedt commented 5 months ago

^ this strikes me as a good compromise: https://github.com/cellannotation/cell-annotation-schema/issues/43#issuecomment-1900467312

I would support this

CC @mfutey @rm1113 @dosumis

mfutey commented 5 months ago

The above suggestion provided by David works for me as well.

rm1113 commented 5 months ago

I am agree with @dosumis

mfutey commented 5 months ago

@dosumis I will update the documentation early next week following your suggestion. See: MVP-4961