BimberLab / DISCVRSeq

A collection of command line tools for working with sequencing data
Apache License 2.0
48 stars 15 forks source link

MergeVcfsAndGenotypes #313

Open wcarre opened 5 months ago

wcarre commented 5 months ago

Hello, I am trying to use DISCVRSeq in replacement of gatk 3.7 CombineVariants. I use the : java -jar ~/DISCVRSeq/DISCVRSeq-1.3.31.jar MergeVcfsAndGenotypes -R ~/hg19_UCSC_wo_hap/fasta/hg19.wo_hap.fasta -V:HC ./HC.test.vcf -V:DV ./DV.test.vcf -V:FB ./FB.test.vcf --genotypeMergeOption UNSORTED -setKey CALLER -O ~/UNSORTED.test.vcf.

But dont get info on all the callers that found the variant. With gatk3.7, I used to have CALLER=HC-DV-FB, but with DISCVRSeq, I just got one CALLER: HC.

Is it the normal behaviour, or is there a bug that prevent the info of all the caller to be reported. How can I get the info otherwise.

Thanks

bbimber commented 5 months ago

Hello,

MergeVcfsAndGenotypes is largely a port of GATK3 into the GATK4 framework, and differences around this were discussed on these threads:

https://github.com/BimberLab/DISCVRSeq/issues/228 https://github.com/BimberLab/DISCVRSeq/issues/189

The source field is generated by GATK's GATKVariantContextUtils around line 1148: https://github.com/broadinstitute/gatk/blob/47a97ae948e4ab6fba7b0b441119ca52ae4c97f9/src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVariantContextUtils.java#L1149

I think their code might have intended to track the sources, but didnt fully implement that. I posed that question to GATK's team here: https://github.com/broadinstitute/gatk/pull/8750