Open RenzoTale88 opened 2 years ago
You need to use seomthing like https://github.com/pangenome/vcfbub in order to remove nested variants. This definitely needs a boost in the documentation.
@glennhickey thanks for the quick reply. I thought vcfbub just remove those variants which are not on level-0 in the vcf file?
Yes, but it allows you to specify a maximum bubble size, which lets you remove level-0 variants that are too big and keep only the children.
You never want both a level N and level N+1 variant from the same site in your VCF, as they contain redundant information.
But... you may want to use some kind of decomposition to clean up the big bubbles. We tried a few approaches in https://www.biorxiv.org/content/10.1101/2022.07.09.499321v1.abstract that you can find digging through the methods.
The most extreme is just to realign everything to the reference to try to get a simpler VCF. This is effective, but you have to keep in mind that the variants in your VCF no longer correspond to the topology of the graph: https://github.com/vcflib/vcflib/blob/master/doc/vcfwave.md
Oh the re-alignment would be good actually. I don't need to use the graph topology as such since I want to feed the variants to bcftools consensus
. So the simpler the better. Thanks so much for this, I'll give vcfwave a go!
Hello, I'm trying to apply
bcftools consensus
using a cactus VCF file, but coming across overlapping variants in the process. I can see that the overlapping variants correspond to several level-1 variants following a large level-0 variant, like this one:Is there a way to combine these type of sites into larger multiallelic sites? Sorry for the slightly bizarre question, and thanks in advance for the help!
Andrea