chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
34 stars 22 forks source link

disease_ontology_term_id MUST support a list of values #687

Open jahilton opened 7 months ago

jahilton commented 7 months ago

We are currently limited to 1 disease_ontology_term_id value, but some studies have use cases to list multiple. For example, de Vrij et al (we currently started curating) studies HIV-leishmaniasis coinfection, so some donors are healthy/normal, some are HIV, but some are HIV+leishmaniasis.

categorical with str categories. The value MUST be formatted as one or more comma-separated (with no leading or trailing spaces) MONDO terms in ascending lexical order or "PATO:0000461" for normal or healthy.

For example, if the terms are "MONDO:0005109" and "MONDO:0011989" then the value of disease_ontology_term_id MUST be "MONDO:0005109,MONDO:0011989".

brianraymor commented 7 months ago

Similar to self_reported_ethnicity, there would need to be discussions on how this change would be surfaced in CELLxGENE experiences. CC: @pablo-gar @niknak33

@jahilton - When do you expect to publish the related collection under curation?

jahilton commented 7 months ago

Only waiting on the contributor to check with his colleagues on some things before Publishing.

brianraymor commented 7 months ago

How are you planning to model the value in the interim?

jahilton commented 7 months ago

The individuals with multiple diseases will have a disease value of HIV. The contributor decided that the general user would be best served by immediately recognizing HIV over the others, and they'll have additional columns to specify the author conditions.

brianraymor commented 2 months ago

May 1 2024 - Since this issue was opened, there are no additional datasets blocking on this feature.

jychien commented 1 month ago

Currently have another dataset looking at Parkinson's and Alzheimer's diseases in patients for brain tissue samples. For donors with both disease, authors will select a single disease to conform to the schema and temporarily have an additional author columns to specify multiple diseases. However, both these diseases are significant for the brain and would be much more accurately be represented as a list in disease_ontology_term_id.