Closed ahwanpandey closed 5 years ago
Hi, Theoretically you can compare against any cohorts. Underlying test is a simple fisher's test which checks for differences in ratios based on read counts.
tcgaCompare
only focusses on non-synoynmous variants, so you should be fine. You should be worried only if you're looking for non-exonic variants. I think the read depth would make a difference for some variant calls. What about analysing the cohorts separately and reporting them individually? You have lots of samples in both. It would be interesting to see how many variants you find in the cohort with larger mean depth and not in the one with lesser mean depth.
The issue here is that we are worried that the higher mutation burden we are seeing in the high coverage group is because of the higher coverage and not the biology. Since the two cohorts we are comparing are so homogenous in their coverage distribution it doesn't seem right to compare directly with a metric like mutations_per_mb. The higher coverage data would have an advantage at detecting low frequency variants or even just having more power to detect variants in general ( we are using Mutect2, Varscan2, VarDict and Strelka2 with their defaults )
We have thought of things like downsampling the data, creating a higher VAF cut-off or even making a metric like (Mutation Burden = mutations_per_mb /coverage).
Do you have any suggestions or experience regarding this situation where the two cohorts are so homogenous in their coverage distribution?
Hello, These issues are inherently associated with sequencing. Simple solution I could suggest is to genotype all your variants detected across all samples.
consensus
set of variants which would include all unique variants detected across all samples. Genotype Given Allele (GGA)
mode which takes a vcf file and only genotypes them. I am pretty sure strelka2 also have this feature.This is also what we did in one of the project where we did multi region sequencing of a tumor, where we had to force call a consensus set of variants due to coverage differences.
I hope my explanation was clear.
Hi @PoisonAlien, I think our cohort is too heterogeneous to apply that strategy. I can see how this strategy would work for Germline data (like GATK Joint calling) or data from single patient/cell-line as their mutation spectrum would be similar. Each of the sample in our cohort is from a different patient so it is not necessary that they would share similar mutation sites.
But thanks for sharing your experience!
Hello,
I have a question regarding mutation burden that hopefully you can give me some insight for. I have WGS data for two cohorts. They have been sequenced at different depths:
Now would I be able to directly compare the mutation burden of these two cohorts using tcgaCompare against the TCGA samples? Two things that seem very different here are that
Thanks.