bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
189 stars 53 forks source link

Question about SEQ-II datasets #96

Closed LeiHaoa closed 3 years ago

LeiHaoa commented 3 years ago

Hi, I am very interested in the SEQ-II datasets, I found that (https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP162370) contained many subread.bam, like m54027_171215_191235.subreads.bam, but I do not know how should I use these data. For example, if I want to use a certain caller to call tumor-normal pair from this dataset, I don't know the corresponding relation of these files.

What's more, is the vcf file from (ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/release/latest/) represent the truth of the above data?

many thanks!

litaifang commented 3 years ago

I'm not really sure what those *.subreads.bam files are. The fastq files we've used to construct the truth set (i.e., the ftp site) is on SRA (they may be on other places that I don't know, but they're definitely on SRA): https://sites.google.com/view/seqc2/home/sequencing.

LeiHaoa commented 3 years ago

There are many bam file listed under SRA (SRP162370) (https://www.google.com/url?q=https%3A%2F%2Ftrace.ncbi.nlm.nih.gov%2FTraces%2Fsra%2F%3Fstudy%3DSRP162370&sa=D&sntz=1&usg=AFQjCNG62E1Oiq62rhfooHD98IGfcMwgYg), the link is right under the site you specified. I just do not know what these subread.bam file means and if I can use these bam files for variant calling.

The fastq file can be download successful.
I don't know if I understand the way: I can download the FD_T_1 and FD_N_1, and this contained the raw fastq file, the truth vcf file is under the ftp site??

Thanks!

litaifang commented 3 years ago

SEQC-II produced a massive amount of data. I'm not familiar with all of them. The subreads.bam could be single-cell sequencing data, but I'm not sure.

Yes, the FD_T_1 and FD_N_1 represent one of the 21+ pairs of WGS data sets we have used to construct the truth vcf file (in ftp site).

LeiHaoa commented 3 years ago

Thanks for your patient answer!!