Closed dosumis closed 1 month ago
CC @evanbiederstedt
In discussion with @evanbiederstedt it became clear that he is happy with the simplifications to the descriptions/guidance in https://github.com/cellannotation/cell-annotation-schema/blob/main/docs/cap_anndata_schema.md . All that remains of relevance are the value and examples.
However, we have decided to use this as an opportunity to review existing field defs in CAS and try to improve them to be more READABLE. This work is described in #122
Related work to add missing fields to CAS CAP extension is in this PR #121
Closing this ticket as superceded by ticket/PR in last comment.
Status: DRAFT
Evan wrote:
Note - this repo has now been retired https://github.com/cellannotation/cap_file_planning/
CAS field names and definitions can be reviewed here: https://github.com/cellannotation/cell-annotation-schema/blob/main/build/CAP_schema.md (Notes: this may lag as only updated on release; this also has CAP specific fields)
Mapping between CAP schema and CAS is here: https://github.com/cellannotation/cell-annotation-schema/issues/43#issuecomment-1836508993
From eyeballing, it looks like the definitions are in sync, but that the CAP schema file defines various new schema specification keys and splits content between them. It is not impossible to split content in this way in JSON schema - additional unspecified schema fields are allowed - they are just not read by any standard JSON Schema libs. However, splitting out content will have consequences for other projects using the schema - e.g. the BICAN taxonomy editor uses the JSON description field to populate its help fields. To do this strictly we could specify these fields as JSON schema extensions so that we can validate. This would also allow us to specify intent of fields for other users of schema.
Example:
CAS:
cell_fullname (string): This MUST be the full-length name for the biological entity listed in cell_label by the author. (If the value in cell_label is the full-length term, this field will contain the same value.) NOTE: any reserved word used in the field 'cell_label' MUST match the value of this field. EXAMPLE 1: Given the matching terms 'LC' and 'luminal cell' used to annotate the same cell(s), then users could use either terms as values in the field 'cell_label'. However, the abbreviation 'LC' CANNOT be provided in the field 'cell_fullname'. EXAMPLE 2: Either the abbreviation 'AC' or the full-length term intended by the author 'GABAergic amacrine cell' MAY be placed in the field 'cell_label', but as full-length term naming this biological entity, 'GABAergic amacrine cell' MUST be placed in the field 'cell_fullname'.
CAP (with added comment column)
NOTE: any reserved word used in the field [cellannotation_setname] MUST match the value of this field.
EXAMPLE 1:Given the matching terms 'LC' and 'luminal cell' used to annotate the same cell(s), then users could use either terms as values in the field [cellannotation_setname]. However, the abbreviation 'LC' CANNOT be provided in this field [cellannotation_setname]--cell_fullname.
EXAMPLE 2: Either the abbreviation 'AC' or the full-length term intended by the author 'GABAergic amacrine cell' MAY be placed in the field [cellannotation_setname], but as full-length term naming this biological entity, 'GABAergic amacrine cell' MUST be placed in this field [cellannotation_setname]--cell_fullname.
EXAMPLE 1:cell_label: 'AC' (abbreviation)cell_fullname:'GABAergic amacrine cell'
EXAMPLE 2:cell_label: 'LC' (abbreviation)cell_fullname: 'Luminal cell'
EXAMPLE 3:cell_label: 'Schwann cell'cell_fullname: 'Schwann cell' (same entry)
How to proceed????