chanzuckerberg / single-cell

A collection of documents that reflect various design decisions that have been made for the cellxgene project.
MIT License
4 stars 2 forks source link

Scientists want cellxgene to support data modalities that help define cell types and states #56

Closed ambrosejcarr closed 2 years ago

ambrosejcarr commented 3 years ago

Background

The central dogma of molecular biology describes the flow of molecular information between genetic systems. Each of these stages and processes can be assayed to generate count data. In brief:

Input    Output               Process                            Multiplicity
DNA      poised polymerase    transcription initiation           1:N
pp.      preRNA               transcription                      1:1
preRNA   mRNA                 splicing,                          1:N
mRNA     protein              translation,                       1:1           
protein  active protein       phosphorylation, ubiquitinlyation, 1:N           
                              cleavage, other, )

There are common assays that generate biological data on molecules (not exhaustive):

Molecule  Assays
DNA       ATAC-seq, DNAse-seq
pp.       Polymerase ChIP
preRNA    end-localized RNAseq: 10x 3', 10x 5', Drop-seq, InDrops, Full length RNAseq: SS2
mRNA      end-localized RNAseq: 10x 3', 10x 5', Drop-seq, InDrops, Full length RNAseq: SS2
protein   CITE-seq, MIBI, CyTOF, 

There are also assays or algorithms that measure or infer processes (not exhaustive):

Process      Assay
splicing     RNA-velocity
translation  ribosome profiling

Important notes:

The types of assays that users have requested are listed as child tickets of this epic.

ambrosejcarr commented 3 years ago

chanzuckerberg/cellxgene#1673 contains a potential implementation for multiple symbol types, which could be relevant to datasets with multiple modalities that map to the same feature types. e.g. PTPRC (gene) vs CD45R0 (messenger, protein)