Open amithasandur opened 2 years ago
Installed cyvcf2, read output vcf and truth vcf, parsed output vcf variants into a list of filters using set(variant.FILTER.split(";")).
@davidbenjamin I noticed that while parsing the truth vcf and extracting variant.FILTER it returns None
in each line, on looking into the cyvcf2 documentation it mentions that a value of PASS or ‘.’ in the VCF will give None for this function
. However in our email discussion you mentioned that in the output vcf (from Mutect2 pipeline) if the variant is unfiltered then its value is None.
Does the None value in output & truth vcf mean the same in both cases?
@amithasandur The Mutect2 pipeline output and the truth VCFs are both filtered, so whenever variant.FILTER
is None
it means in both VCFs that the variant is PASS
.
In intermediate stages of the pipeline -- after Mutect2 but before FilterMutectCalls -- there are VCFs with .
in the FILTER column, but you may assume that your tool never needs to deal with that case.
Out of curiosity, how did you define the truth set of somatic SNVs in this project?
Met with David today and we reviewed the code I had written for parsing truth and query vcf and counting true positives, true negatives, false positives and false negatives. Next, I will work on stratifying these counts by properties such as the filters applied by Mutect2 and allele fraction.
Note to self- here are my action items:
cc: @fleharty @brigranger
@DarioS For now we're just testing on artificial "truth" VCFs.
This ticket is for tracking the progress of the concordance project described here which is in collaboration with @davidbenjamin from DSP. The output of GATK Mutect2 is in VCF format, and there is an existing GATK tool to compare the output VCF to a truth VCF containing real mutations. David is working on improving Mutect3, as part of that there is a need for a tool specific to somatic variant calling that can diagnose patterns of good and bad performance. To get started, he provided resources of example VCFs of filtered calls from Mutect2 and a truth vcf. I am working on creating a Python script to count the number of false positives/negatives by comparing the mutations in both the VCFs using cyvcf2 library. After that the next step is to stratify these values by allele fraction and other filters applied by Mutect2.
cc: @fleharty @brigranger