Open Krannich479 opened 1 year ago
I am suffering the exact same problem here as you were. But I did not set SNP cuz I am not intended to study SNP, only SVs are of my interest. So does it mean that I can use the:
sed -i 's/GT:GL:GQ:FT:RC:DR:DV:RR:RV/GT/g ' simulated.vcf
To fix it? Thanks
OK I figured it out, besides that line of code, we also need to add headers like this manually:
##FILTER=<ID=LowQual,Description="Low quality">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise variant">
Hi @rl4940,
the issue I had, the hotfix I proposed (in this issue) as well as the PR #190 aim for a format correction of SNV records only. However, I hypothesize that if your callset does not include the additional fields (GQ:FT:RC:DR:DV:RR:RV) it's probably safe to use the sed
command from above.
(disclaimer: I am not a maintainer of SURVIVOR, I just tried to fix a bug here)
Hello Fritz & SURVIVOR dev team,
What.
I attempted to use the VCF file generated by SURVIVOR with bcftools, as for instance recommended in issue https://github.com/fritzsedlazeck/SURVIVOR/issues/173. However, bcftools suffers from a bug that originates from SURVIVOR I think.
Error.
When using bcftools (sort+index) on SURVIVOR's truthset VCF, I get warnings and an error regarding the VCF header not matching the FORMAT fields.
I looked into the code at https://github.com/fritzsedlazeck/SURVIVOR/blob/ed1ca5188a2d9286d582417b7a65938c768df995/src/simulator/SV_Simulator.cpp#L951 where the FORMAT field for SNP variant records is written. In case the
print_vcf_header2
function above is the corresponding header than the FORMAT fields indeed do not match.Solution.
I propose two trivial solutions here:
I am voting for solution 1 here because: a) I think FT, RC, DR, DV, RR, RV are ancient relics of Lumpy unrelated to SNPs. Also these FORMAT fields have been commented out throughout most of SURVIVOR. b) The missing fields are not part of the VCF4.2 standard and should not be present if not defined and used. c) I tested that the VCF file generated by SURVIVOR works flawlessly with bcftools if the fields are removed from SNP records. (see hotfix below)
Hotfix.
sed -i 's/GT:GL:GQ:FT:RC:DR:DV:RR:RV/GT/g ' simulated.vcf
where simulated.vcf is the VCF by SURVIVOR.