asl / BandageNG

a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
GNU General Public License v3.0
116 stars 10 forks source link

Connected components? #115

Open tnn111 opened 2 years ago

tnn111 commented 2 years ago

Hi,

Is there a way of getting BandageNG to output connected components as separate assembly graphs? I work with metagenomes and it would be extremely useful to be able to do this and then work on the individual graphs.

Thanks, Torben

asl commented 2 years ago

Are you sure you're having separate components? This is very unusual. Usually metagenomes are just one huge component connected via conservative elements such as ribosomal genes and other that underwent HGT.

Can you post the graph info information here?

tnn111 commented 2 years ago

Hi Anton,

I’m working with large environmental metagenomes and while they do contain some hairballs, it’s mostly a collection of components. The graph alone is around 500 MB. I’m attaching a screen shot from BandageNG. What I’d like to be able to do is to extract each of the connected components so that I can work on them individually.

Thanks, Torben

On Aug 25, 2022, at 00:12, Anton Korobeynikov @.***> wrote:

Are you sure you're having separate components? This is very unusual. Usually metagenomes are just one huge component connected via conservative elements such as ribosomal genes and other that underwent HGT.

Can you post the graph info information here?

— Reply to this email directly, view it on GitHub https://github.com/asl/BandageNG/issues/115#issuecomment-1226866497, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRUHZMN63MQY3BVHKE3V24MFLANCNFSM57RTW2SQ. You are receiving this because you authored the thread.

asl commented 2 years ago

Sorry, no screenshot. Likely you'd need to do this via GitHub website, not email

tnn111 commented 2 years ago

Trying again :-)

PastedGraphic-1

tnn111 commented 2 years ago

Just did. Thanks!

On Aug 25, 2022, at 10:02, Anton Korobeynikov @.***> wrote:

Sorry, no screenshot. Likely you'd need to do this via GitHub website, not email

— Reply to this email directly, view it on GitHub https://github.com/asl/BandageNG/issues/115#issuecomment-1227535886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRS65CNSCAOT3EJY6FTV26RJVANCNFSM57RTW2SQ. You are receiving this because you authored the thread.

asl commented 2 years ago

Oh, ok. Looks like a very fragmented one with low complexity. For now I'd just select a node from component and draw on, say, distance 500 around the node.

asl commented 2 years ago

@tnn111 Will it work if we'd add another possible scope, namely "component containing node(s)"?

tnn111 commented 2 years ago

Hi Anton,

Yes, that would work. The reason for it is to be able to focus on what’s commonly a much smaller subgraph.

I needed the functionality so I wrote a trivial piece of code to parse a GFA file and extract based on the subgraph containing a given segment or a given path. It works well enough and I’m experimenting with adding additional functionality as I go.

Thanks, Torben

On Sep 6, 2022, at 00:21, Anton Korobeynikov @.***> wrote:

@tnn111 https://github.com/tnn111 Will it work if we'd add another possible scope, namely "component containing node(s)"?

— Reply to this email directly, view it on GitHub https://github.com/asl/BandageNG/issues/115#issuecomment-1237756038, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRWSTMEROBVP7JZTYC3V43WHHANCNFSM57RTW2SQ. You are receiving this because you were mentioned.

asl commented 2 years ago

Well, SPAdes has gfa-split tool in the package, that splits GFA into connected components preserving PATHS, etc. :)

Also, you can certainly reuse GFA parser from BandageNG, it's pretty self-contained (and was made such for the purpose of easy reuse).

rlorigro commented 2 years ago

If you're willing to build this repository it also has the same functionality in the build/split_connected_components executable: https://github.com/rlorigro/GFAse/blob/main/src/executable/split_connected_components.cpp

This repository also has a subgraph extraction executable, but it will not preserve a path in the output: https://github.com/vgteam/GetBlunted/blob/master/src/executable/extract_subgraph.cpp