ding-lab / PanCan_snATAC_publication

13 stars 4 forks source link

data request #3

Closed tkisss closed 10 months ago

tkisss commented 10 months ago

Hi,

Great job! I am very interested in the multi-omics data presented in your published paper. However, unfortunately, accessing the data was not convenient. Following the instructions in your paper, I have downloaded all the level 3 data for scRNA-seq and scATAC-seq data from the HTAN WUSTL atlas, totaling 1070 files. How can I distinguish which files are from your research and how can I align the corresponding scRNA-seq and scATAC-seq data for the same samples?

WX20240105-195107

Thanks!

nvterekhanova commented 10 months ago

Hello,

We have a lookup table with id mappings here: https://github.com/ding-lab/PanCan_snATAC_publication/blob/main/Sample_ID_Lookup_table_in_repositories.xlsx

To obtain samples used in the study from HTAN DCC, please use ids from the "HTAN DCC Biospecimen ID" column from there. And to align between RNA and ATAC, please use either "Piece_ID_ATAC" or "HTAN DCC Biospecimen ID" column.

Nadezhda

tkisss commented 10 months ago

Hi, Thank you for your response. I noticed that HTAN provides multiome data only for scATAC-seq, while scRNA-seq data is not publicly available. Is the scRNA-seq data publicly available on dbGaP?If so, I will submit a download request through the dbGaP database.

nvterekhanova commented 10 months ago

Level 3 data should be available for both RNA and ATAC via Synapse (open access); and for ATAC/RNA data Levels 1-2, dbGaP access is needed. The type of data access for the files is listed in the "Data Access" column on the HTAN DCC portal.

Nadezhda

tkisss commented 10 months ago

Hi, Thanks for your response. Could you provide me with some more detailed guidance? I indeed couldn't find the RNA data. For example, I found the ATAC data for the sample with HTAN DCC Biospecimen ID=HTA12_164_1, and the HTAN DCC Biospecimen ID for the corresponding RNA data is also HTA12_164_1 in the lookup table. However, the RNA data is currently not accessible in the HTAN database.

WX20240118-164413

nvterekhanova commented 10 months ago

Hello,

Those four files are outputs of cellranger-arc, and files barcodes.tsv, features.tsv and matrix.mtx contain data for both RNA and ATAC assays. Here is the detailed description about the cellranger-arc outputs from the 10X: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/output/matrices. For all multiome samples listed in the lookup table, there would be one set of matrix files on HTAN DCC, with both ATAC/RNA (outputs of cellranger-arc).

Nadezhda

tkisss commented 10 months ago

Thank you very much for your assistance. This is my first time working with cellranger-arc output, and I really appreciate your guidance.