chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
38 stars 23 forks source link

Update self_reported_ethnicity_term_id #874

Open brianraymor opened 5 months ago

brianraymor commented 5 months ago

Status

See Add parent classes for ethnicity and ancestry terms for background. This update to the ontology model simplifies the schema requirements and validator.

The latest HANCESTRO release includes simplified modeling which demonstrates progress, but there are still missing terms in the ethnicity category under review.

Design

self_reported_ethnicity_ontology_term_id

Key self_reported_ethnicity_ontology_term_id
Annotator Curator MUST annotate.
Value categorical with str categories.

If organism_ontolology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be "na".

If organism_ontolology_term_id is "NCBITaxon:9606" for Homo sapiens, the value MUST be "unknown" if unavailable; otherwise, the value MUST meet the following requirements:

  • The value MUST be formatted as one or more comma-separated (with no leading or trailing spaces) HANCESTRO terms in ascending lexical order with no duplication of terms.
  • Each HANCESTRO term MUST be a descendant of "HANCESTRO:0601" for ethnicity category.

  • For example, if the terms are "HANCESTRO:0590 and HANCESTRO:0580" then the value of self_reported_ethnicity_ontology_term_id MUST be "HANCESTRO:0580,HANCESTRO:0590".


jahilton commented 5 days ago

URL is broken. This one works: https://www.ebi.ac.uk/ols4/ontologies/hancestro/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHANCESTRO_0601

Not sure how else to review in the absence of a plan for the existing data in the corpus that would violate the update.

brianraymor commented 5 days ago

I must have a github page open somewhere with an uncommitted change, because I know that I fixed that broken URL. Doh. Updated. Thanks.