Closed dosumis closed 1 week ago
RE: "labelset"
This is simply a historic term used by CAP developers; it comes up in discussion still.
This should be replaced by "cell annotation set"
cellannotation_setname - caused lots of confusion - both in discussion with devs and with BICAN. You approved changing it to labelset some time ago. Please don't approve changes if you don't agree with them.
In the current schema - labelset = name of obs key (I preferred annotation key but that was vetoed). annotations = key under which annotations live.
One of the big advantages of going through CAS is that name changes happen in one place and can be propagated down to other representations. - see cellannotation/cas-tools#6
@mfutey @evanbiederstedt - table on alignment CAS and CAP_encoding_for_anndata.md. Need to discuss how we align.
CAP Encoding for AnnData file | CAS (only noting where names differ or are missing from CAS) | Notes |
---|---|---|
obs | ||
Cell Annotation Metadata | annotations | Change in PR approved by Evan & Mary |
[cellannotation_setname] | labelset | This was changed in a PR approved by Evan & Mary. cellannotation_setname was unniversal causing confusion |
[cellannotation_setname] | cell_label | |
[cellannotation_setname]--cell_fullname | ||
[cellannotation_setname]--cell_ontology_exists | ||
[cellannotation_setname]--cell_ontology_term_id | ||
[cellannotation_setname]--cell_ontology_term | ||
[cellannotation_setname]--rationale | ||
[cellannotation_setname]--rationale_dois | ||
[cellannotation_setname]--marker_gene_evidence | ||
[cellannotation_setname]--canonical_marker_genes | ||
[cellannotation_setname]--synonyms | ||
[cellannotation_setname]--category_fullname | ||
[cellannotation_setname]--category_cell_ontology_exists | ||
[cellannotation_setname]--category_cell_ontology_term_id | ||
[cellannotation_setname]--category_cell_ontology_term | ||
[cellannotation_setname]--cell_ontology_assessment | ||
**uns*** (Dataset metadata) | ||
cellannotation_schema_version | ||
cap_publication_timestamp | cellannotation_timestamp | Do we need CAP prefix on field name? |
cap_publication_version | version | Do we need CAP prefix on field name? |
dataset_title | ||
dataset_description | ||
dataset_url | matrix_file_id | Not quite the same thing. Need to discuss |
cap_publication_title | ||
cap_publication_description | ||
cap_publication_authors_list | Not yet in CAS? Make this generic - other authors? See #41 | |
cap_publication_url | cellannotation_url | Might not be the same thing? |
cap_author_name | author_name | change to primary author name? - see #41 |
cap_author_contact | author_contact | |
cap_author_orcid | orcid | Make this primary_author_orcid? |
cellannotation_metadata | ||
[cellannotation_setname]--metadata | labelsets | Change in PR approved by Evan & Mary |
cellannotation_setdescription | description | Why the odd prefix? Is this meant to be prefixed with cellannotation_setname for flattening purposes? |
annotation_method | Is this meant to be prefixed with cellannotation_setname for flattening purposes? | |
algorithm_name | Is this meant to be prefixed with cellannotation_setname for flattening purposes? | |
algorithm_version | Is this meant to be prefixed with cellannotation_setname for flattening purposes? | |
algorithm_repo_url | Is this meant to be prefixed with cellannotation_setname for flattening purposes? | |
reference_location | Is this meant to be prefixed with cellannotation_setname for flattening purposes? | |
reference_description | . | not in CAS |
@evanbiederstedt We reviewed before Xmas, but Mary wanted your input on this before making decisions.
@dosumis
I believe @rm1113 removed all CAP_
prefixes as this caused confusion. I agree that we don't need cap_
everywhere. It doesn't help with community standards.
Comments
• RE: dataset_url This is the URL which CAP mints. It's a data portal, so we can mint these ourselves.
I doubt this needs to be associated with CAS.
RE: Is this meant to be prefixed with cellannotation_setname for flattening purposes?
No, check the AnnData uns file. https://github.com/cellannotation/cell-annotation-schema/blob/main/cap_anndata_schema.md#uns-dataset-metadata
@evanbiederstedt @dosumis
We still have cap_
prefixes for the publication related uns fields (see string names only):
workspace_title = "cap_publication_title"
workspace_description = "cap_publication_description"
workspace_url = "cap_publication_url"
authors_list = "cap_publication_authors_list"
publication_timestamp = "cap_publication_timestamp"
publication_version = "cap_publication_version"
main_author = "cap_author_name"
main_author_orcid = "cap_author_orcid"
main_author_contact = "cap_author_contact"
My vision here that these fields don't make a lot of sense outside of the CAP and we don't really need to allign it with CAS. Please comment if you disagree about any field from the list. I won't protest against removing prefixes if it make sence.
@mfutey @evanbiederstedt - table on alignment CAS and CAP_encoding_for_anndata.md. Need to discuss how we align.
@dosumis we just released the v.1.0.0 of the CAP AnnData schema with following changes: dataset_title -> title dataset_description -> description cellannotation_setdescription -> description
HI @rm1113
My preference would be to keep only these as CAP:
workspace_title = "cap_publication_title"
workspace_description = "cap_publication_description"
workspace_url = "cap_publication_url"
And make the rest generic. These are all useful outside of CAP:
authors_list = "cap_publication_authors_list"
publication_timestamp = "cap_publication_timestamp"
publication_version = "cap_publication_version"
main_author = "cap_author_name"
main_author_orcid = "cap_author_orcid"
main_author_contact = "cap_author_contact"
^ this strikes me as a good compromise: https://github.com/cellannotation/cell-annotation-schema/issues/43#issuecomment-1900467312
I would support this
CC @mfutey @rm1113 @dosumis
The above suggestion provided by David works for me as well.
I am agree with @dosumis
[x] cellannotation/cell-annotation-schema#44
[ ] Catalog places where field names have diverged - discuss convergence. e.g. cellannotation_setname --> labelset
[ ] Move motivation fields to info into JSON file e.g. https://github.com/cellannotation/cap_file_planning/blob/main/cap_anndata_schema.md#cell-annotation-set-description has "motivation/use case: Free-text to explain why this Cell Annotation Set exists for other scientists. "