ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Merge VCF records at the same position together after bcftools norm left shifts them #1536

Open glennhickey opened 15 hours ago

glennhickey commented 15 hours ago

This adds @Han-Cao's merge_duplicates.py as a post-processing step after bcftools norm -f.

It merges together records at the same position that bcftools norm can create (see the original issue, #1493, for details).

Unfortunately, it seems like bcftools norm can shift together sites that can't be represented as one without allele conflicts. So the --keep first heuristic is used by default to choose one, but can be toggled in the mergeDuplicatesOptions field on the config XML.

As it stands bcftools norm -f and merge_duplicates.py are both on by default for vcfwave output. To apply it to the "Default" vcfs, then bcftoolsNorm="1" needs to be activated in the config.

Stacks on #1491