bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
189 stars 53 forks source link

Recommendations for using your thoroughly vetted High Confidence regions. #102

Closed SSunkara closed 2 years ago

SSunkara commented 2 years ago

Hello,

This is more of a comment than an issue.

First, thanks for the thorough work on providing these reference datasets. This is incredibly useful. Would you have any reservations in recommending the usage of the HC regions for other somatic cancers, for example colorectal, and lung?

I don't think we'd have sufficient resources to replicate the kind of analysis you've done for other cancer types, but am curious if you have some experience/insights there.

Thank You, Sirisha

litaifang commented 2 years ago

Thanks. The short answer, in my opinion, is that I don't have any reservation to use it as a reference for other cancers for technical/sequencing purposes, but not for biological or clinical reference.

The SEQC2's reference sequencing data should be representative of cancer data for its technical representatives, i.e., mutation context in sequence, tumor heterogeneity, wide range of copy number aberrations through the genome, wide range of variant allele frequencies (VAF) for all the somatic mutations, etc. We also have sequenced the DNA at different sequencing centers on different generations of Illumina sequencers (plus PacBio) at different sequencing depths, so the sequencing data also have a pretty wide range of different technical aspects due to these differences, etc.

However, it was not designed to represent a cancer in a biological context. For that purpose, you need to sequence a whole population of cancers to get the idea and there is TCGA for that. For this resource, the purpose is that we can use it to benchmark and develop somatic mutation calling pipelines for tumor tissue sequencing, and stratify the pipeline's performance in different mutation context (e.g., VAF, etc.). It's good down to about 5% VAF, and probably a bit below that but not much lower than that.

SSunkara commented 2 years ago

Makes sense. A pan-cancer approach with the kind of technical rigor you demonstrate is probably best suited to establish over-arching HC regions.

Thanks again for making all your methods available. This is still pretty valuable information.