chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

Improve validation for cases where there is only one donor_id value in `obs` #820

Open brianraymor opened 5 months ago

brianraymor commented 5 months ago

must include the following requirement:

If there is one obs['donor_id], all observations MUST be the same value.

Guidance from @jahilton:

That is a check in our curation qa process, and it is possible to automate so certainly something to look into implementing But I would push it off of 5.1.0 simply because we would first want to audit the corpus to ensure we capture any edge cases

jahilton commented 5 months ago

There's no reason this logic is restricted to only Datasets w/ 1 donor_id value. So... Improve validation for ~cases where there is only one donor_id value in obs~ donor metadata Check that each donor_id only has 1 value for organism_ontology_term_id, sex_ontology_term_id, self_reported_ethnicity_ontology_term_id

Can't include disease (common to have healthy vs disease samples from the same individual) or development_stage (longitudinal studies may have samples from the same individual at different ages)

*Ideally this check occurs within a Collection