asl / BandageNG

a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
GNU General Public License v3.0
114 stars 10 forks source link

Using `BandageNG layout` is `Linear graph layout` turn `On` or `Off` by default? #148

Closed subwaystation closed 9 months ago

subwaystation commented 9 months ago

I am wondering this to better understand how the layout algorithm works. In linear mode, the graph would be stretched longer, I would assume? Would it also make it more human readable? By default in the GUI, it is turned off. I never had the impression turning it on would help me to better understand a graph.

Thanks for any feedback!

asl commented 9 months ago

It certainly depends on what is your definition of "human readable". I would not consider linearized graphs to be "more human readable" in many cases, it depends on the graph.

Bandage-NG uses Fast Multipole Multilevel layout algorithm as implemented in OGDF library (see Stefan Hachul, Michael Jünger: Drawing Large Graphs with a Potential-Field-Based Multilevel Algorithm. 12th International Symposium on Graph Drawing 1998, New York (GD '04), LNCS 3383, pp. 285-295, 2004). The details of particular Bandage-NG use of OGDF could be found in https://github.com/asl/BandageNG/blob/dev/layout/graphlayoutworker.cpp

But in linear layout mode the nodes are sorted according to their IDs and then they are added to the graph left-to-right and these positions are used as initial placement of the nodes.

Hope this helps

subwaystation commented 9 months ago

Ahh, got it, thanks for the details!

One more question: BandageNG uses one thread for each graphical component when I apply it to e.g. https://nf-co.re/pangenome/1.0.0/results/pangenome/results-9fc8ccd2eebb78ad93d14b52744d2e20967494dc/FINAL_GFA/?file=scerevisiae8.fasta.gz.squeeze.unchop.Ygs.view.gfa (which comes with 15 disconnected components). Does this decrease the layout calculation time much compared to only using one thread? Also, I can't select the number threads BandageNG is using, right? Just that I didn't miss something.

asl commented 9 months ago

Does this decrease the layout calculation time much compared to only using one thread?

Well, it depends on the sizes of individual components. In many cases graphs tend to contain 1-2 large components and plethora of smaller ones, so certainly it's not the perfect parallel execution as one could imagine. However, we'd rely on what's available from OGDF and this seems to be the cheapest way to obtain some parallelism.

Running layout of your graph on my laptop, I am seeing that up to 10 threads are used, so I'd expect quite some speedup, maybe not 10x, but 2-4x for sure as compared with single threaded execution. For concurrency we rely on what is available from Qt just to not add complexity (e.g. OpenMP or other library). Currently there is no way to control # of threads.

Another important point is that many heuristics in Bandage is centered around assembly graphs, when you're having things like "long node - tangle - again long node". Things are different for pangenome graphs when it's all "bubble-after-bubble-after-bubble". One need, for example, to tune default node length setting almost surely. This speedups layout considerably.

subwaystation commented 9 months ago

This answer helped a lot, thanks!