chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
38 stars 24 forks source link

cellxgene-schema CLI must update validation for uns['spatial'] #1107

Open brianraymor opened 2 weeks ago

brianraymor commented 2 weeks ago

Changelog

Design

spatial

Key spatial
Annotator Curator MUST annotate if assay_ontology_term_id is descendant of "EFO:0010961" for Visium Spatial Gene Expression or "EFO:0030062" for Slide-seqV2; otherwise, this key MUST NOT be present.
Value dict. The requirements for the key-value pairs are documented in the following sections:
  • spatial['is_single']
  • spatial[library_id]
  • spatial[library_id]['images']
  • spatial[library_id]['images']['fullres']
  • spatial[library_id]['images']['hires']
  • spatial[library_id]['scalefactors']
  • spatial[library_id]['scalefactors']['spot_diameter_fullres']
  • spatial[library_id]['scalefactors']['tissue_hires_scalef']

Additional key-value pairs MUST NOT be present.


is_single

Key is_single
Annotator Curator MUST annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression or "EFO:0030062" for Slide-seqV2; otherwise, this key MUST NOT be present.
Value bool. This MUST be True:
  • if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and the dataset represents one Space Ranger output for a single tissue section
  • if assay_ontology_term_id is "EFO:0030062" for Slide-seqV2 and the dataset represents the output for a single array on a puck
Otherwise, this MUST be False.


spatial[_libraryid]

Key Identifier for the Visium library
Annotation Curator MUST annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value dict. There MUST be only one library_id.


spatial[_libraryid]['images']

Key images
Annotation Curator MUST annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value dict


spatial[_libraryid]['images']['fullres']

Key fullres
Annotation Curator MAY annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value The full resolution image MUST be converted to anumpy.ndarray with the following requirements:

  • The length of numpy.ndarray.shape MUST be 3
  • The numpy.ndarray.dtype MUST be numpy.uint8
  • The numpy.ndarray.shape[2] MUST be either 3 (RGB color model for example) or 4 (RGBA color model for example)


spatial[_libraryid]['images']['hires']

Key hires
Annotation Curator MUST annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value tissue_hires_image.png MUST be converted to anumpy.ndarray with the following requirements:

  • The length of numpy.ndarray.shape MUST be 3
  • The numpy.ndarray.dtype MUST be numpy.uint8
  • If assay_ontology_term_id is "EFO:0022860" for Visium CytAssist Spatial Gene Expression, 11mm, the largest dimension in numpy.ndarray.shape[:2] MUST be 4000pixels; otherwise, the largest dimension in numpy.ndarray.shape[:2] MUST be 2000pixels. See Space Ranger Spatial Outputs
  • The numpy.ndarray.shape[2] MUST be either 3 (RGB color model for example) for 4 (RGBA color model for example)


spatial[_libraryid]['scalefactors']

Key scalefactors
Annotation Curator MUST annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value dict


spatial[_libraryid]['scalefactors']['spot_diameter_fullres']

Key spot_diameter_fullres
Annotation Curator MUST annotate if assay_ontology_term_id is a descendant "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value float. This must be the value of the spot_diameter_fullres field from scalefactors_json.json. See Space Ranger Spatial Outputs.


spatial[_libraryid]['scalefactors']['tissue_hires_scalef']

Key tissue_hires_scalef
Annotation Curator MUST annotate if assay_ontology_term_id is a descendant of "EFO:0010961" for Visium Spatial Gene Expression and uns['spatial']['is_single'] is True; otherwise, this key MUST NOT be present.
Value float. This must be the value of the tissue_hires_scalef field from scalefactors_json.json. See Space Ranger Spatial Outputs.


ejmolinelli commented 3 days ago

Most of these changes only require testing that _check_spatial_uns allows descendants of Visium. Testing multiple relationships (descendant, sybling, parent, self) is redundant since these are all handled inside the _check_spatial_uns function. There are some changes that require additional test:

  1. pixel limit dimension check for visium 11

If assay_ontology_term_id is "EFO:0022860" for Visium CytAssist Spatial Gene Expression, 11mm, the largest dimension in numpy.ndarray.shape[:2] MUST be 4000pixel