chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
34 stars 22 forks source link

cellxgene-schema CLI must update validation for obs['is_primary_data'] #834

Closed brianraymor closed 2 weeks ago

brianraymor commented 1 month ago

Design

See is_primary_data.

The new requirement is:

This MUST be False if uns['spatial']['is_single'] is False.

is_primary_data

Key is_primary_data
Annotator Curator MUST annotate.
Value bool. This MUST be False if uns['spatial']['is_single'] is False. This MUST be True if this is the canonical instance of this cellular observation and False if not. This is commonly False for meta-analyses reusing data or for secondary views of data.


nayib-jose-gloria commented 2 weeks ago

@brianraymor @brian-mott for validation of obs columns that depend on spatial metadata like this--do we need to account for datasets having both spatial and non-spatial assay rows? Would that scenario ever happen?

If so, should This MUST be False if uns['spatial']['is_single'] is False. only apply to rows with Visium Spatial Gene Expression or Slideseqv2 assays? Or all rows as long as uns['spatial']['is_single'] exists and is False?

brianraymor commented 2 weeks ago

do we need to account for datasets having both spatial and non-spatial assay rows? Would that scenario ever happen?

No. The use of uns for spatial implicitly indicates that we're "allowing" only Visium Spatial Gene Expression or Slideseqv2 in the dataset (not a mixture of assays per observations). I think we may have had a hard requirement in assay_ontology_term_id in an earlier draft that required all observations to have the same term id when Visium Spatial Gene Expression or Slideseqv2. We could clarify. CC: @jahilton for thoughts

jahilton commented 2 weeks ago

Yes, I can see the gap in the current documentation & think we can close that by enforcement in assay_ontology_term_id. If any observation is assay:Visium then all must be assay:Visium. If any observation is assay:Slide-seqV2 then all must be assay:Slide-seqV2. This blocks the possibility of a Visium-Slide-seq integration, but I think that would complicate the X table. Better to deal with that when we expand to more spatial assays (when that type of integration is more likely)

brianraymor commented 2 weeks ago

@nayib-jose-gloria - tracking in https://github.com/chanzuckerberg/single-cell-curation/issues/871 - will create a matching CLI issue when the schema is updated.