chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

Improve cellxgene’s value proposition for data generators and consumers by capturing RNA+ATAC experiments. #195

Closed brianraymor closed 1 year ago

brianraymor commented 2 years ago

Goal

Improve cellxgene's value proposition for researchers generating and publishing experiments that take both RNA-seq and ATAC-seq measurements from the same cells by enabling users to access both RNA and ATAC data modalities for the same set of cells in the cellxgene explorer, when they are available.

Context

cellxgene currently serves data from RNA-seq experiments, measuring per-gene expression, and ATAC-seq experiments, measuring open chromatin regions along the genome. As ATAC-seq data are not confined to gene bodies, cellxgene serves only gene activity matrices, where the features have been translated from genomic coordinates to genes, resulting in gene features akin to RNA-seq data. We are anticipating an increasing amount of data from assays that provide both RNA-seq and ATAC-seq readouts from the same cell/nucleus. Users want to be able to explore both RNA-seq and ATAC-seq data, and leverage the dual measurements, using cellxgene.

If trends follow other assays, then we can expect a majority of data from these joint profiling experiments to come from 10x products, which in this case would be the multiome kit. Other protocols that achieve similar measurements include SNARE-seq and SHARE-seq.

jahilton commented 2 years ago

landscape doc

brianraymor commented 2 years ago

Document reviewed with curators.

jahilton commented 1 year ago

Update: We currently have 2 Collections that contain data from "10x multiome", an assay which measure both expression (RNA) & chromatin accessibility (ATAC).

We have data incoming from Seed Network researchers that will be mCT-seq, which measure expression & methylation - so will present a similar challenge. Summarizing the issue at hand: There is currently no way for a user to know if the data represents the expression or the accessibility measurements (both of those Collection currently only have expression data) I imagine this will be a blocker for including these datasets in the Census.

Additional concern - validator needs to know if it's RNA (and should have raw counts) or is not (and thus isn't required to have raw counts)

brianraymor commented 1 year ago

Closing again until this experiment is prioritized by Data Generation.