biolxy / TCGA_HLA_benchmark

GNU General Public License v3.0
9 stars 1 forks source link

TCGA exome data #2

Open davetang opened 3 years ago

davetang commented 3 years ago

Hello!

Thank you for the benchmark! From your paper, you used data described at phs000178.v11.p8? From that page it seems that all exome data was generated using either the SOLiD or 454 platforms. May you please provide a bit more information about the raw data you used for this benchmark or point me to where I can get this information?

Thanks again, Dave

haoecust commented 3 years ago

Hello!

Thank you for the benchmark! From your paper, you used data described at phs000178.v11.p8? From that page it seems that all exome data was generated using either the SOLiD or 454 platforms. May you please provide a bit more information about the raw data you used for this benchmark or point me to where I can get this information?

Thanks again, Dave

Hi Dave,

You can browse all of the WXS bam files on the GDC TCGA website (see following URL). Actually, we just downloaded the slicing bam on HLA region. (hg38 chr6:28032222-34032223). The manual document is also lised as bellow. Hope it helps.

Best

Hao

TCGA WXS bam files on GDC website https://portal.gdc.cancer.gov/repository?filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.analysis.workflow_type%22%2C%22value%22%3A%5B%22BWA%20with%20Mark%20Duplicates%20and%20Cocleaning%22%5D%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.data_format%22%2C%22value%22%3A%5B%22bam%22%5D%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.experimental_strategy%22%2C%22value%22%3A%5B%22WXS%22%5D%7D%7D%5D%7D

Manuals on TCGA bam slicing: https://docs.gdc.cancer.gov/API/Users_Guide/BAM_Slicing/

davetang commented 3 years ago

Hi Hao,

thanks for the reply. From your link it seems that at least some samples were sequenced using an Illumina sequencer (I only scrolled through the first couple of pages). My question stems from the dbGaP page for phs000178.v11.p8, which had a table showing that the exomes were sequenced using SOLiD or 454. I'm referencing that page because in your benchmark paper it was stated that:

Whole exome sequencing data were retrieved from TCGA database through dbGap with accession number phs000178.v11.p8.

In addition, I'm asking because I wish to apply for access to the TCGA dataset via dbGap. But if I apply for access to phs000178.v11.p8 and only get access to SOLiD or 454 exomes, that wouldn't be useful. But since you could access other exomes via the GDC, perhaps phs000178.v11.p8 permission allows access to all TCGA data?

Cheers, Dave