Open edg1983 opened 2 years ago
I have the same question regarding the different "FAILN" tags. So I did a quick check on my VCF file, It seems "FAILN" were assigned based on the GQ value, here is what I found:
PASS: 99 >= GQ > 40 FAIL1: 40 >= GQ > 21 FAIL2: 21 >= GQ > 0 FAIL3: GQ = 0
The author also mentioned in their paper that the high confidence filtration used in graphtyper would bring a big loss of true positives, so they didn't set this filtration as default. It might be more reasonable to filter genotypes based on a certain GQ threshold, instead of filtering out all "failed' records.
I am not sure if the GQ value is the only rule to assign different FAILN tags.
Hope that helps.
$ bcftools concat test.lst.genotype.vcf/chr21/*.vcf.gz | grep -v ^# | grep -v "LowQUAL" |grep -v "LowQD" |less -SN
...
235 chr21 46237437 chr21:46237437:DG N
What if all three lines are passes?
Hello,
Can you please provide some more details on the format of graphtyper output VCF? I'm interested essentially in 2 points:
The graphtyper output VCF reports
Can you please explain the meaning of the 3 DEL lines? I think they represent the same call, but using 3 different models for genotyping:
AGGREGATED
,BREAKPOINT
andCOVERAGE
. This create issues downstream since I have essentially redundant calls. Can you provide any suggestion on how to use / combine these 3 outputs? If I want to obtain a clean list of variants, which of the 3 do you suggest to use?