Open BAevermann opened 1 year ago
~Would this only be required when there was complete support for 10X multiome
? Or is this being proposed to allow census to consume this assay now and be "future proofed" when/if ATAC is supported?~
~Reviewing the draft for the assay tier proposal:~
~1. 10X multiome (RNA) is experimental. And there are 60 datasets?~ ~2. 10X multiome (ATAC) is unsupported.~
~How is 10X multiome currently modeled?~
~Will also flag mCT-seq
which measures both RNA & methylation (we have 1 Collection, which holds the expression data)~
~In Lattice, we use biological_macromolecule
with an enum of RNA
,DNA
, or protein
~
~As an alternative, I could also imagine a field that lists the type of measurement being represented (expression, accessibility, etc.)~
~Assortment of proposals were discussed on 11/30~
~Current proposal~ ~Field name: Modality~ ~Values: (Controlled vocab)~ ~Transcriptomics~ ~Epigenomics~ ~Proteomics~ ~Spatial Transcriptomics~ ~Spatial Proteomics~ ~in-situ hybridization assay~
~Will meet in early Q1 to discuss further.~
notes the "Modality" proposal:
Based on the proposal, I took a stab at mapping each current assay in the corpus to the Modality values - this sheet - for others to review Biggest Q is that I'm not sure how to characterize the morphology & electrophysiology measurements that are a part of Patch-seq (in addition to the transcriptomics).
Agreed with Jason, we would be overloading this filed with the addition of "spatial". This axis of variation is likely to be already captured by assay. The main goal as I read it is to distinguished between molecules for downstream applications, and with the upcoming support for Spatial, I don't see a need to overlap this variable.
I'd prefer to stick to the name of the molecule or the omics term.
Thanks for the mapping @jahilton!
modality
field & instead capture the readout elsewhere, if needed.April 15 2024 (@BAevermann, @brianraymor, @jahilton, @jychien, @pablo-gar)
obs['modality'] transcriptomics epigenomics proteomics
@BAevermann, @jahilton, @jychien, @pablo-gar
Would you please review the draft in the top-level summary comment under Design. (I cannot submit a PR for this field because its schema version is unknown at this time.)
Comments, LGTM, or emojis all accepted. Also feel free to edit in place.
@brianraymor I think it should be "proteomics" for spatial proteomics [EFO:0700000] and its descendants. @jahilton can confirm
otherwise LGTM
I think it should be "proteomics
Doh. Cut-n-paste error. Corrected.
One too many 0
- mCT-seq [EFO:~0~0030060]
Need to add "...and its descendants" to sci-RNA-seq [EFO:0010550]. Otherwise, sci-RNA-seq3 is not covered.
Could add "...and its descendants" to scATAC-seq [EFO:0010891] and ditch the "10x scATAC-seq [EFO:0030007]" row
Potential risks with relying on descendants for this one. Some hypotheticals:
10x multiome
under both 10x transcription profiling
?
Fairly confident that won't happen unless we ask for it.spatial proteomics
and spatial transcriptomics
I think we're focused on transcriptomics enough to be resistant to those rare occurrences, but just wanted to raise them.
One too many
0
- mCT-seq [EFO:~0~0030060]
Good catch. I owe you a $1. Corrected.
Need to add "...and its descendants" to sci-RNA-seq [EFO:0010550]. Otherwise, sci-RNA-seq3 is not covered.
Added.
Could add "...and its descendants" to scATAC-seq [EFO:0010891] and ditch the "10x scATAC-seq [EFO:0030007]" row
The problem is that 10x multiome is a descendant of scATAC-seq.
Potential risks with relying on descendants for this one. Some hypotheticals:
- If EFO does the appropriate thing and moves
10x multiome
under both10x transcription profiling
? Fairly confident that won't happen unless we ask for it.- If a term is created that is a descendant of both
spatial proteomics
andspatial transcriptomics
I think we're focused on transcriptomics enough to be resistant to those rare occurrences, but just wanted to raise them.
This could be part of the review when the schema updates EFO in a version?
All the spatial assays are represented as descendants of either spatial transcriptomics or spatial proteomics. Is finer granularity is required?
Not that we're accepting this assay, but FYI, NanoString digital spatial profiling is a child of spatial transcriptomics and it has in its definition 'spatial analysis of RNA and protein'. Other than that, the descendants of spatial transcriptomic or proteomics look good to me.
I can decompose the spatial cases into individual supported assays if that's preferable.
The problem is that 10x multiome is a descendant of scATAC-seq.
👍
I don't think further decomposing is needed. I think adding a review step whenever EFO is updated should be sufficient. And establishing a 'supported assay' list will also help as it will narrow the scope of that review
With the updated submission policy, our only smFISH
Datasets (which are private) will be removed, and no more will be accepted.
So the smFISH and its descendants
row can be simplified to MERFISH EFO:0008992
Updated. Added a note to Update requirements for suspension_type.
Per conversation with @jahilton and @BAevermann - it does not currently make sense to allow "epigenomics" as a value for mCT-seq
. It is unsupported by the updated submission policy. There are no published datasets with this assay+modality combination. It's ~struck~ above.
Based on the renewed discovery for 10X multiome, I'm reverting this issue from schema 5.2.0 and re-opening.
Design (@brianraymor)
obs
...
modality
str
categories. This MUST be"epigenomics"
or"transcriptomics"
.This MUST be the correct type for the corresponding assay:
EFO:0030059
]"epigenomics"
or"transcriptomics"
EFO:0030007
]"epigenomics"
EFO:0030080
] and its descendants"transcriptomics"
EFO:0700004
]"transcriptomics"
EFO:0700003
]"transcriptomics"
EFO:0010010
] and its descendants"transcriptomics"
EFO:0008720
]"transcriptomics"
EFO:0008722
]"transcriptomics"
EFO:0700011
]"transcriptomics"
EFO:0008780
]"transcriptomics"
EFO:0008796
]"transcriptomics"
EFO:0030060
]"epigenomics"
or"transcriptomics"
EFO:0008992
]"transcriptomics"
EFO:0002761
] and its descendants"epigenomics"
EFO:0030002
]"transcriptomics"
EFO:0008853
]"transcriptomics"
EFO:0022490
]"transcriptomics"
EFO:0010891
]"epigenomics"
sci-Plex[EFO:0030026
]"transcriptomics"
EFO:0010550
] and its descendants"transcriptomics"
EFO:0008919
] and its descendants"transcriptomics"
EFO:0010184
] and its descendants"transcriptomics"
EFO:0008994
] and its descendants"transcriptomics"
EFO:0009919
]"transcriptomics"
EFO:0008953
]"transcriptomics"
EFO:0700010
]"transcriptomics"
If the assay does not appear in this table, the most appropriate value MUST be selected and the curation team informed during submission so that the assay can be added to the table.
Context
At current the CELLxGENE schema does not capture the concept of a detected analyte.
This concept is usually implied in the assay name, for example the mRNA analyte as detected by 10x 3' transcriptional profiling. However, for assays such as "10x multiome" the analyte detected is ambiguous as it measures both mRNA and open chromatin.
This distinction is required by downstream tools such as Census or Expression to filter supported vs unsupported data.