Closed harish0201 closed 1 year ago
Generally speaking, in human cancer, even the cancer with the highest mutation burden have about 50K somatic mutations but 5 million germline SNPs. So when you do a first-pass somatic mutation calling, it's quite common to have more false positives due to germline variants than actual somatic mutations. So to train a classifier for somatic mutations, the germline variants need to be filtered out of the truth vcf and are considered false positives.
Hi!
Thank you for the wonderful documentation and the tool! Please excuse if the question seems stupid.
We are trying to generate the model on our own datasets (TN pairs, in canine) and were wondering if the Truth vcfs needed to be somatic calls?
We do have a germline vcf which we had used to recalibrate the alignments with, and is fairly extensive as it has samples (>500) from across the globe.
Regards, Harish