ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 19 forks source link

Question: How to recover "colors" from the graph? #57

Open phiweger opened 4 years ago

phiweger commented 4 years ago

Hi,

I construct a colored dBG w/ Bifrost and remove the overlaps using gimbricate. From the documentation I read that seqwish can now be used to extract the paths of the constituting genomes through the graph (as both paths through the graph and the "colors" of the individual nodes). Could you point me to how I can use seqwish to do this?

Thanks a lot and kind regards!

ekg commented 4 years ago

You'll need to realign your read sets to the graph using vg map or GraphAligner. The colors are specific to the DBG, and will need to be established again.

There is also a tool called stark which allows the compression of the DBG and generation of a blunt-ended graph.

Why do you need the blunt ended graph? What is your workflow?

On Fri, Jul 17, 2020, 08:56 Adrian Viehweger notifications@github.com wrote:

Hi,

I construct a colored dBG w/ Bifrost and remove the overlaps using gimbricate. From the documentation I read that seqwish can now be used to extract the paths of the constituting genomes through the graph (as both paths through the graph and the "colors" of the individual nodes). Could you point me to how I can use seqwish to do this?

Thanks a lot and kind regards!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/57, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPIJPNFQPZY4ULXSCLR37Y3DANCNFSM4O5VYXHQ .

phiweger commented 4 years ago

Thank you for your quick response! I want to generate a species level pangenome graph, and then map reads to it to. By identifying the most likely color, I'd then find the closest genome in the graph. I stumbled on a comment of yours here.

ekg commented 4 years ago

Do you have assemblies of the different species? Or are they only represented as colors in your DBG?

phiweger commented 4 years ago

I start from assemblies.

ekg commented 4 years ago

Ok then you don't have colors, but "paths" in variation graph parlance. These are provided in the output GFA file, under the P records. You can visualize them and interact with them using tools in odgi and the vg toolkit.

You could re-color the nodes of the graph by indexing the nodes by which paths go through them. This is how the libhandlegraph data model works. There is a python library (libbdsg) that let's you work with the graph and query things like this using different backing data structures to store the graph. https://doi.org/10.1093/bioinformatics/btaa640

On Fri, Jul 17, 2020, 19:44 Adrian Viehweger notifications@github.com wrote:

I start from assemblies.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/57#issuecomment-660250039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEL2KPIQMF7CCRLRR3LR4CEWNANCNFSM4O5VYXHQ .

phiweger commented 4 years ago

ah, so that was where I got confused. perfect, these resources should get me started, thanks a lot for your help!

ekg commented 4 years ago

Great, I hope it helps. Let me know how you get on. Check back if you have any problems.

I'm working on a postprocessing tool for seqwish graphs, smoothxg, which should be ready for general use soon. That tool makes the graph look partially ordered locally, which is generally what we expect, while preserving larger SVs. The idea there is also to package up a lot of steps that I was doing with odgi, so that there is only one step after seqwish for most use cases. I'd expect that to stabilize in the coming month or so--- I've run into a few bugs and some aspects of the parameterization aren't fully worked out--- but in the future it'll be the preferred step immediately after seqwish.