broadinstitute / gatk-sv

A structural variation pipeline for short-read sequencing
BSD 3-Clause "New" or "Revised" License
170 stars 70 forks source link

Grouped MEI's with insertions in splitvariants.py #687

Closed kirtanav98 closed 3 months ago

kirtanav98 commented 3 months ago

This address issue 649. svtk vcf2bed uses the ALT field to produce the svtype column in the output BED file. This means that the svtype column includes BND alt alleles and values like INS:ME for MEIs. However, the current and previous SplitVariants tasks in GenotypeBatch matched exactly on the string "INS" when creating insertion-specific BED files, so the MEIs were grouped with BCAs instead. Here the MEI's are grouped together with the insertions when creating the insertion-specific BED files instead of the BCA's. This can allow for further evaluation the impact of this on genotyping. This has been successfully been validated with womtool and cromshell using the 1kgp reference panel inputs. The results of the previous script and docker and the results of the updated script and docker can be found here)