fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

Running SomaticSeq on GLASS-WG VCFs #53

Closed sbamin closed 5 years ago

sbamin commented 6 years ago

I am implementing SomaticSeq for consensus variant calling on filtered mutation calls from Mutect2, VarScan2 and LoFreq. SomaticSeq provides default model (70-100 sequence features) based on DREAM3 challenge synthentic data. That worked well with F1 score reaching as high as 0.9 for predicting true positive variants in DREAM3 challenge. While their paper shows somaticseq performing nice on validation sets, I wonder if we can test on human glioma set for ...

SomaticSeq code is self-contained and requires normal and tumor bam along with vcfs from M2 and VS2 (more callers are optional).

Both of these steps can give us better rationale for using somaticseq default (or glioma trained model) on canine data.

Thoughts?

Here is an issue: https://github.com/bioinform/somaticseq/issues/63

fpbarthel commented 6 years ago

For some reason I'm just seeing this today. This is a nice idea, and probably relatively easy to factor in.

@Kcjohnson also tagging you here. It sounds like this could be a good way to merge calls, aka step (2) in the proposed workflow

@sbamin Do you have experience using it? Have you tested with GATK4 Mutect2? From the GitHub they only describe using it with MuTect2.

sbamin commented 6 years ago

Haven't run it yet but see https://github.com/bioinform/somaticseq/issues/63 Author has shared good tips to get started. I can test on canine vcf but that would be barebone run without any training data. Author suggests to use bamsurgeon to prepare synthentic mutation data for training. I guess that could be less preferable unless we get limited success running training on human glioma data.

fpbarthel commented 6 years ago

OK happy to take this on. It will take several weeks before first results because currently occupied with BAM realignment.

sbamin commented 6 years ago

That works and thanks! I can chime in from issues that arise with running somaticseq on canine data.

sbamin commented 6 years ago

There is snakemake implementation! https://github.com/bioinform/somaticseq/blob/master/utilities/snakemake/Snakefile

fpbarthel commented 5 years ago

Closing this issue because we moved to single-caller due to time constraints