cellannotation / cell-annotation-schema

General, open-standard schema for cell annotations
9 stars 1 forks source link

Review/revise schema field descriptions - annotations #122

Open dosumis opened 1 month ago

dosumis commented 1 month ago

CAP anndata schema markdown & the JSON schema specification are out of sync wrt descriptions. This text needs to be readable by any end user of the schema or tools built with it as it appears in integrated help.

In the CAP anndata schema the description is in a stored in a custom schema description field called 'value'

column [cellannotation_set]--cell_fullname
index Cell barcode names
dtype string
value The full-length name for the biological entity listed in [cellannotation_setname] by the author.
source file or UI
required for publication on CAP yes
example 'rod bipolar'

In JSON schema we use the description field:

"cell_fullname": {
          "description": "This MUST be the full-length name for the biological entity listed in `cell_label` by the author. (If the value in `cell_label` is the full-length term, this field will contain the same value.) \nNOTE: any reserved word used in the field 'cell_label' MUST match the value of this field. \n\nEXAMPLE 1: Given the matching terms 'LC' and 'luminal cell' used to annotate the same cell(s), then users could use either terms as values in the field 'cell_label'. However, the abbreviation 'LC' CANNOT be provided in the field 'cell_fullname'. \n\nEXAMPLE 2: Either the abbreviation 'AC' or the full-length term intended by the author 'GABAergic amacrine cell' MAY be placed in the field 'cell_label', but as full-length term naming this biological entity, 'GABAergic amacrine cell' MUST be placed in the field 'cell_fullname'.",
          "type": "string"
        }

For ease of reading we derive a markdown file that looks like this:

cell_fullname (string): This MUST be the full-length name for the biological entity listed in cell_label by the author. (If the value in cell_label is the full-length term, this field will contain the same value.) NOTE: any reserved word used in the field 'cell_label' MUST match the value of this field. EXAMPLE 1: Given the matching terms 'LC' and 'luminal cell' used to annotate the same cell(s), then users could use either terms as values in the field 'cell_label'. However, the abbreviation 'LC' CANNOT be provided in the field 'cell_fullname'. EXAMPLE 2: Either the abbreviation 'AC' or the full-length term intended by the author 'GABAergic amacrine cell' MAY be placed in the field 'cell_label', but as full-length term naming this biological entity, 'GABAergic amacrine cell' MUST be placed in the field 'cell_fullname'.

CAS - General schema markdown file is here https://github.com/cellannotation/cell-annotation-schema/blob/main/build/general_schema.md

Task:

Make spreadsheet with columns fieldname | Current text (CAS) | Current text (CAP) | Proposed revision

Fields to target: Fields under 'Annotation' object in CAS; The map to fields under Cell Annotation Metadata in the CAP anndata spec.

Note that fields in the CAP spec are all prepended with cellannotation_setname. This corresponds to labelset in CAS. The one exception is the label field which in CAP is just cellannotation_setname

dosumis commented 1 month ago

Note - the capitalization of MUST and SHOULD in the existing descriptions is deliberate and follows: https://www.ietf.org/rfc/rfc2119.txt

dosumis commented 6 days ago

@JABelfiore - let's review this ASAP and work out is still needed.

dosumis commented 6 days ago

See also https://github.com/cellannotation/cell-annotation-schema/issues/43