Open Han-Cao opened 3 months ago
I see what you mean. Is there a bcftools
command (or other tool) that can do this?
I tried bcftools norm -m +any
, but it cannot merge these records (v1.20). I am not sure whether other tool can do it.
What I currently do is skipping bcftools norm -f
to generate a non-overlapping multiallelic VCF (like old cactus) for tools require non-overlapping input.
Same issue here @glennhickey . The variants are even in the same position.
1 62173 >4733>4736 G A 60 . AC=3;AF=0.3;AN=10;AT=>4733>4734>4736,>4733<4735>4736;NS=8;LV=0 GT 0|. 0|. .|. 1|. . .|. .|. .|. .|. 0|0 .|. 1|1 .|. 0 . 0 0|.
1 62173 >4736>4767 GTGGTGGCACAGCCGTGATCACAGACTCAGGGTGATGTGGGTCCCCATGGTGGCACAGCCGTGACCATGACCTCAGGGTGACGTGGGTCCCCA G,GTGGTGGCACAGCCGTGACCATGACCTCAGGGTGCTATGGGTCCCCATGGTGACACCACCACGACAACCACGGCCTCAGGGTGACGTGGGTCCCCA 60 . AC=7,2;AF=0.7,0.2;AN=10;AT=>4736>4737>4738>4740>4741>4743>4744>4746>4747>4749>4750>4752>4753>4755>4756>4758>4760>4761>4763>4764>4766>4767,>4736>4767,>4736>4737<4739>4740<4742>4743<4745>4746<4748>4749<4751>4752<4754>4755<4757>4758<4759>4760<4762>4763<4765>4766>4767;NS=8;LV=0 GT 0|. 1|. .|. 1|. . .|. .|. .|. .|. 1|1 .|. 2|2 .|. 1 . 1 1|.
There's a tool in vcflib, vcfcreatemulti
that seems to be able to merge overlapping variants. It's included in the Cactus docker release (but not binary release). I'd be curious to know how that works out. In the meantime, I'm leaning towards disabling the normalizer by default in the next release.
Hi,
In current MC pipeline, the default output VCF (i.e., after
vcfbub
andbcftools norm
) may have overlapping variants due to left align.For example:
In
mc.raw.vcf.gz
, there is no overlap between the below 2 variantsAfter running
bcftools norm -f GRCh38.fa
, the output variants are overlapping:Do you think these 2 records should be merged into one multiallelic record? Moreover, as overlapping variants are not accepted by some tools (e.g., pangenie), this could be an issue for downstream analysis.