chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

cellxgene_schema must validate suspension_type #521

Closed brianraymor closed 1 year ago

brianraymor commented 1 year ago

The validation and automation must be updated. Placeholder until schema 3.1 is published.

Current requirements are in the top-level summary comment in Update requirements for suspension_type.

nayib-jose-gloria commented 1 year ago

suspension_type validation updates for 3.1 are ready for validation testing. Run the following command to install a version of cellxgene-schema with the relevant changes. After installing, you can test cellxgene-schema validate against any example h5ads with pertinent matching/mismatching assay_ontology_term_id and suspension_type combos:

pip install git+https://github.com/chanzuckerberg/single-cell-curation/@suspension-type-update-test#subdirectory=cellxgene_schema_cli
tihuan commented 1 year ago

@nayib-jose-gloria I see that this is in single-cell-curation repo, which there's no related staging deploy steps in the oncall doc. Should I still move this to ready for prod or leave it in ready for staging?

Thank you!

nayib-jose-gloria commented 1 year ago

@tihuan leave it in ready for staging! We have a separate validation/deployment for single-cell-curation

tihuan commented 1 year ago

Got it thanks, Nayib!

jychien commented 1 year ago

I've ran cellxgene-schema 3.0.2 on sample datasets for the following, and it looks good to me:

Assay Nucleus Cell na
MARS-seq: cell Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0008796 passed Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0008796
BD Rhapsody Whole Transcriptome Analysis: cell Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0700003 passed Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0700003
BD Rhapsody Targeted mRNA: cell Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0700004 passed Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0700004
inDrop: cell or nuclei passed passed Values must be one of ['cell', 'nucleus'] when 'assay_ontology_term_id' is EFO:0008780
STRT-seq: cell Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0008953 passed Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0008953
Seq-Well S3 - cell. So, can expand to child of Seq-Well for cell as Jason suggest Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0008919 or its children passed Values must be one of ['cell'] when 'assay_ontology_term_id' is EFO:0008919 or its children
TruDrop: cell or nuclei passed passed Values must be one of ['cell', 'nucleus'] when 'assay_ontology_term_id' is EFO:0700010
GEXSCOPE technology: cell or nuclei passed passed Values must be one of ['cell', 'nucleus'] when 'assay_ontology_term_id' is EFO:0700011
SPLiT-seq: cell or nuclei passed passed Values must be one of ['cell', 'nucleus'] when 'assay_ontology_term_id' is EFO:0009919
spatial transcriptomics [EFO:0008994] and children Values must be one of ['na'] when 'assay_ontology_term_id' is EFO:0008994 or its children Values must be one of ['na'] when 'assay_ontology_term_id' is EFO:0008994 or its children passed