This adds @Han-Cao's merge_duplicates.py as a post-processing step after bcftools norm -f.
It merges together records at the same position that bcftools norm can create (see the original issue, #1493, for details).
Unfortunately, it seems like bcftools norm can shift together sites that can't be represented as one without allele conflicts. So the --keep first heuristic is used by default to choose one, but can be toggled in the mergeDuplicatesOptions field on the config XML.
As it stands bcftools norm -f and merge_duplicates.py are both on by default for vcfwave output. To apply it to the "Default" vcfs, then bcftoolsNorm="1" needs to be activated in the config.
This adds @Han-Cao's merge_duplicates.py as a post-processing step after
bcftools norm -f
.It merges together records at the same position that
bcftools norm
can create (see the original issue, #1493, for details).Unfortunately, it seems like
bcftools norm
can shift together sites that can't be represented as one without allele conflicts. So the--keep first
heuristic is used by default to choose one, but can be toggled in themergeDuplicatesOptions
field on the config XML.As it stands
bcftools norm -f
andmerge_duplicates.py
are both on by default forvcfwave
output. To apply it to the "Default" vcfs, thenbcftoolsNorm="1"
needs to be activated in the config.Stacks on #1491