broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.65k stars 583 forks source link

Mutect multi-sample calling #4887

Closed davidbenjamin closed 5 years ago

davidbenjamin commented 6 years ago

That is, multiple samples of the same tumor in space or time, a tumor and a metastasis, a tumor and its cfDNA etc.

First we have to ask collaborators and the community what features they would like and what problems they want to solve.

samuelklee commented 6 years ago

Keep me updated. I think a multi-sample version of ModelSegments would be pretty easy to implement and would hopefully share a similar command-line scheme for specifying allelic-count and denoised-copy-ratio files for the normal/tumors.

Something to think about is that the modeling step probably can be done on a per-sample basis after multi-sample segmentation, and it would be nice to scatter per sample for WGS. There are probably a few ways we can implement this, but let me know if you're planning something similar for M2.

fpbarthel commented 5 years ago

I am very interested in this feature. In my current workflow I am using freebayes (and in an older version simply pileup) to query mutation sites (after doing the actual mutation calling using gATK4 M2) on a cohort level and would be very interested in a GATK naive approach.

P.s. personally more interested in a multi-step mutation calling method than multi-sample calling, where one first calls mutation in multiple samples that are then joined together in a consensus callset, after which the consensus callset is queried individually across the entire cohort of patient resulting in genotype and allele frequencies for each variant across the entire cohort

ldgauthier commented 5 years ago

Since we're using Mutect2 for our mitochondrial variant calling pipeline that's in development, and we want to joint call mitochondria, I'm working on "somatic joint calling". It won't have a joint likelihood model (yet?) the way that germline SNP and indel joint calling does, but it will be able to give you a "squared-off" matrix of calls for each sample at each site that's variant in any sample.

davidbenjamin commented 5 years ago

Closed by #5560.