ACEnglish / truvari

Structural variant toolkit for VCFs
MIT License
319 stars 48 forks source link

How to know FN, FP sequence after refine? #238

Closed lok27395 closed 16 hours ago

lok27395 commented 22 hours ago

Hi authors/users,

I am running truvari on ground truth HG002 to benchmark variant calling. I follow GIAB approach. Bench then refine to get refine.variant_summary.json.

However, I want to know the SVLEN and SVTYPE of those being classified as FN or FP. Unlike normal bench, do I only get start and end information from refine.regions.txt?

Many thanks!

ACEnglish commented 16 hours ago

Refine will subset the regions to only analyze those which would benefit from refinement. The refine.regions.txt will tell you through the refined column which regions were harmonized. Once those regions are put through the MSA, the subset of variants are then run through their own truvari bench which is output to the sub-directory phab_output/. There you will find fn.vcf.gz and fp.vcf.gz for the new variant representations after harmonization. However, they will not have SVLEN or SVTYPE INFO annotations. you can add those using truvari anno svinfo.

An alternative workflow would be to use truvari ga4gh --with-refine to consolidate the tp-base/fn VCFs (from the main bench output and in the sub-directory phab_output/ and the same to tp-comp/fp. You'll then get 'truth' and 'query' VCFs with annotations of their state. Again, you can run those through truvari anno svinfo to add SVLEN/SVTYPE INFO annotations.