broadinstitute / gatk-sv

A structural variation pipeline for short-read sequencing
BSD 3-Clause "New" or "Revised" License
162 stars 71 forks source link

Add ReshardVcf workflow #613

Closed mwalker174 closed 8 months ago

mwalker174 commented 9 months ago

This workflow takes a set of vcfs and shards them by contig. The method uses bcftools concat with --regions to pull records from all vcfs for each contig.

This workflow will be needed at the end of ResolveComplexVariants. This is because the contig-sharded vcfs produced in that workflow may contain a small number of records from other contigs, but downstream CleanVcf requires the input vcfs to be strictly contig sharded.

Tested on reference panel outputs from GenotypeComplexVariants.