Closed kokyriakidis closed 3 years ago
bcbio has some heuristics to exclude somatic calls from germline in the absence of a non-tumor sample, basing on known population frequencies. You can look at the EPR
INFO field that's added to the final VCF and filter everything that does not contain pass
(contain, not equal as sometimes there are entries like pass;clinvar
when the variant is in some useful database).
MuTect 2 does support not using a panel (not recommended) but bcbio is not allowing it. I have some half-finished code to submit a PR to do so, but I haven't been able to do much testing.
As for the rest:
SOMATIC
so you can distinguish likely germline calls. VarDict also has INFO tags specifying the "strength" of a call ,from StrongSomatic
to LikelyGermline
.min_allele_fraction
to 0 in your configuration, so that no allelic fraction filtering takes place. The unfiltered calls are in the final VCFs and also in the per-sample outputs in the final
folder after a bcbio run (these have the -germline
suffix to indicate likely germline calls, but manual review is suggested)I would suggest MuTect2 and VarDict as callers, those have performed reasonably well in our testing.
These are the preliminary results using MAF 0.01 (1%) and no filters:
HaplotypeCaller: 75,727 variants detected Mutect2: 62,190 variants detected Freebayes: 76,378 variants detected Vardict: 436,506 variants detected ~200.000 where PASS calls
Ensemble calls (each variant in at least 1 caller): 447,550 variants detected [gatk-haplotypecaller 919 unique variants freebayes 3969 unique variants mutect2 141 unique variants]
Ensemble calls (each variant in at least 1 caller) using PASS filter: ~2,200,000 variants detected
Do you have any idea why Vardict called so many variants?
I've noticed in our own analyses too that VarDict calls a lot of variants. However it is not much of a problem for us because we follow up with fairly strict filtering. You might also want to consider the ensemble options of bcbio (see the documentation).
The whole point of Vardict was to make a very sensitive caller: https://github.com/AstraZeneca-NGS/VarDictJava. The idea was to 'call everything', filter later, for example with https://www.solvebio.com/
Hi @kokyriakidis,
How did you use Mutect2 with Ensemble? Did you use a panel of normal?
Thanks
@halimaach
He used a PON:
background: /media/kokyriakidis/PRODUCTION/Mutect2-exome-panel.vcf
Thanks everyone for the great discussion!
SN
Hi everyone,
I have some Tumor only Illumina data in my possession and I want to call both Somatic and Germline variants. My usecase is to detect ALL Somatic and Germline variants of a Tumor only exome sample.
I have some questions:
Does Mutect2 detect BOTH Somatic and Germline and just tag Somatic variants with "SOMATIC" in the end? Or do I have to run also HaplotypeCaller to detect all Germline Variants from my Tumor-only exome sample?
I want to achieve the highest RECALL. This is my goal and not Precision. Are there any options I have to use to disable any filtering done to the final set of variants?
If I follow the Somatic variant calling user story what should I specify in the yaml file to call both Germline and Somatic variants?
or
Does the first configuration (specifying both somatic and germline callers) work fow tumor only? In the user story it says "For tumor/normal somatic samples, bcbio can call both somatic (tumor-specific) and germline (pre-existing) variants." but I want to clarify it.
EDIT: I get the message "ValueError: MuTect, MuTect2 and Strelka2 somatic calling requires normal sample or panel" when I try to run the following config yaml with tumor only bam. So I guess Mutect does not support tumor only without panel.
Thank you for your time, KK