Closed brianraymor closed 2 years ago
@ambrosejcarr wrote:
I followed up with Malte Luecken on the question of "what is raw data". Malte agreed that we should encourage scientists to filter non-cell barcodes, given our use cases. He suggested that further QC should not be done on the raw matrices (e.g. the selection of highly variable genes), which I think is aligned with our recommendation and makes sense. He made a point that I thought was good about capturing spliced/unspliced counts when they're available:
Malte Luecken [12:31 PM]
rather than separate spliced/unspliced... i would suggest just adding this info to the normal counts you collect ... as adata.layers['spliced']
and adata.layers['unspliced']
and adata.X
or adata.layers['counts']
with the count data then I guess ...
@ambrosejcarr commented on Thu Aug 13 2020
Appetite: ?
This question is limited to 10x scRNA/snRNA and Smart-Seq2-like assays.
Should the schema include a
counts
field? If so, how is it modeled per framework/assay? UMI counts from 10x for example.@ambrosejcarr commented on Fri Sep 11 2020
Addressing @mckinsel questions:
"Detected molecules of RNA per gene" (typical 10x 3' processing), "detected molecules of RNA per transcript" (transcript-aware RNA-seq processing, more commonly associated with SS2), "detected molecules of protein" (CITE-seq, CyTOF, MIBI), and "sequencing reads from promoter regions adjacent to genes" (sc-ATAC-seq) are separate data modalities and we should be aware of that in some way.
They can all be reduced to "observations of gene", and we may want to enable that conversion, but we should be careful, deliberate, and have a separate set of rules for each modality. When we get to CITE-seq data, those naturally correspond better to transcript-level data. the PTPRC gene is a good example of where we'll get tripped up, and in the future I expect we'll start to see phospho (active) and non-phospho (inactive) forms of proteins detected with CITE-seq, introducing additional complexity beyond what's captured at the transcript level.
@ambrosejcarr commented on Fri Sep 11 2020
Created chanzuckerberg/single-cell#56 to track support for other data modalities.
@brianraymor commented on Tue Oct 20 2020
@ambrosejcarr to follow up on Do we want unfiltered barcodes from 10x? We actually got some feedback from one person when shopping around the schema that the answer is yes, thought it was a nice-to-have. The problem is this would not be proper layer in any format as its dimensions are different. and open a new issue as needed. The current position is that the answer is "no".