Closed ambrosejcarr closed 3 years ago
Addressing @mckinsel questions:
"Detected molecules of RNA per gene" (typical 10x 3' processing), "detected molecules of RNA per transcript" (transcript-aware RNA-seq processing, more commonly associated with SS2), "detected molecules of protein" (CITE-seq, CyTOF, MIBI), and "sequencing reads from promoter regions adjacent to genes" (sc-ATAC-seq) are separate data modalities and we should be aware of that in some way.
They can all be reduced to "observations of gene", and we may want to enable that conversion, but we should be careful, deliberate, and have a separate set of rules for each modality. When we get to CITE-seq data, those naturally correspond better to transcript-level data. the PTPRC gene is a good example of where we'll get tripped up, and in the future I expect we'll start to see phospho (active) and non-phospho (inactive) forms of proteins detected with CITE-seq, introducing additional complexity beyond what's captured at the transcript level.
Created chanzuckerberg/single-cell#56 to track support for other data modalities.
@ambrosejcarr to follow up on Do we want unfiltered barcodes from 10x? We actually got some feedback from one person when shopping around the schema that the answer is yes, thought it was a nice-to-have. The problem is this would not be proper layer in any format as its dimensions are different. and open a new issue as needed. The current position is that the answer is "no".
Issue moved to chanzuckerberg/single-cell-curation #9 via ZenHub
Appetite: ?
This question is limited to 10x scRNA/snRNA and Smart-Seq2-like assays.
Should the schema include a
counts
field? If so, how is it modeled per framework/assay? UMI counts from 10x for example.