ding-lab / PanCan_snATAC_publication

13 stars 4 forks source link

ccRCC snRNA-seq data used in this paper? #6

Open YushaLiu opened 5 months ago

YushaLiu commented 5 months ago

I have a quick question about the snRNA-seq data of CCRCC samples that were used in this study. I downloaded the data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE240822, following the data availability section in the paper, but noticed that for each patient sample (e.g., C3L-00004-T1), the matrix containing the UMI counts (matrix.mtx.gz) often has about 1 million columns (which are barcodes). Does each barcode in this matrix represent a single nucleus? If so, why is the number of barcodes way larger than the number of nuclei based on the annotation file GSE240822_GBM_ccRCC_RNA_metadata_CPTAC_samples.tsv?

nvterekhanova commented 4 months ago

Hi @YushaLiu,

The matrix files .mtx.gz correspond to raw feature-barcode matrix files, that are outputs from cellranger (https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/matrices), and they contain all barcodes before filtering. And the annotation files contain cell barcodes after filtering, so that is why there is such a big difference in barcode numbers between those files.

Nadezhda