eblerjana / pangenie

Pangenome-based genome inference
MIT License
107 stars 10 forks source link

Large SV segments (>100kb) caused the filtering out of all SVs in a large region (1Mb) after running vcfbub #84

Closed porkfan closed 1 month ago

porkfan commented 1 month ago

Dear eblerjana,

I have a question regarding the following: In the graph pangenome constructed for the species I am studying, there are some large SV segments. Within these particularly large SV segments, there are actually many smaller SV segments. If I directly use "vcfbub -l 0 -r 100000", it will filter out both the large SV segments and the smaller SV segments contained within them. This could lead to some large missing segments in the constructed graph genome. Is there a way to retain those smaller SV segments while only filtering out the large SV segments that cover them? How should I proceed?

I look forward to your prompt response.

eblerjana commented 1 month ago

This is exactly what vcfbub does, the -r 100000 parameter makes it filter out all top-level bubbles that are longer than 100000, replacing them with their shorter nested segments (< 100000). Can you specify what you mean by "large SV segments"? Also note that I'm not the developer of vcfbub, so for more specific questions related to vcfbub, I'd suggest contacting the developers (https://github.com/pangenome/vcfbub).

porkfan commented 1 month ago

Thank you for your prompt response! I think I have identified the issue—it wasn't a problem with vcfbub, but rather that the VCF generated by the new version of Cactus seems to differ from previous versions, which was causing issues with using PanGenie to build the graph genome index. I've now resolved the problem. Thanks again!