bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Adding Bandage to visual genome assembly #24

Closed danejo3 closed 1 year ago

danejo3 commented 1 year ago

Bandage (Bioinformatics Application for Navigating De novo Assembly Graphs Easily) is a GUI for visualizing the quality of various assemblers when recreating the genome. It was created by Ryan Wick, who also created Unicycler, and is one of the go-to tools for many in the community.

In order to use Bandage, users will need to provide assembly graph files. These files are used to represent the final assembly of a genome. Typically, popular assemblers like SPAdes will auto-generate them alongside the final contigs. For example, users will find an assembly_graph.fastg file in their output directory.

The following assembly graph files are supported by Bandage:

BASIC USAGE:

To use the see assembly result, for iOS, users will click:

File -> Load graph -> {Locate the assembly graph file} After loading the graph, on the left hand side, click on the Draw graph--underneath the Draw Graph section.

image

(If you would like to learn more about what Bandage can do, look here, underneath the Features section.)

danejo3 commented 1 year ago

How to download or install Bandage.

The quickest and easiest way to install Bandage is by going onto their main page and select the appropriate OS system you are using on https://rrwick.github.io/Bandage/.

If choosing this route, users will download a compressed file with the Bandage GUI and a test assembly graph file.

Bang! It's ready to use!

However, if the user would rather install the software from the source, users will need to follow these steps on https://github.com/rrwick/Bandage#building-from-source.

If done this way, users will need to install Qt SDK and must get a version 5.15 or later. Furthermore, in order to use Qt, you must create an account!

danejo3 commented 1 year ago

Help! I'm running into this popup when I try to run Bandage on iOS!

image

If you get this, the correct way to bypass this is by:

Click on the Apple -> System Peferences -> Security & Privacy About middle'ish of the window, you'll see Open Anyway next to "Bandage.app" was blocked from use because it is not from an identified developer. Click Open Anyway and modify your system settings accordingly.

danejo3 commented 1 year ago

I did what you said but it still blocks me!

If your iOS device still fails to update the system settings for the application, a solution that I found was to install the application onto another device and transfer the app to the desired system.

For example, install the iOS version on a windows machine, use a USB drive to transfer the downloaded file from windows onto iOS.

danejo3 commented 1 year ago

Currently YEAT has three available assemblers: SPAdes, MEGAHIT, and Unicycler. All assemblers produce assembly graph(s) but MEGAHIT.

The following files from these assemblers can be loaded into Bandage.

SPAdes (4):

Unicycler (6):

danejo3 commented 1 year ago

In order to see the visual results from MEGAHIT to Bandage, users will need to use megahit_toolkit provided by MEGAHIT. This tool is only available in version 0.3.0+.

https://github.com/voutcn/megahit/wiki/Visualizing-MEGAHIT's-contig-graph

image

Note: the red box.

Because of this tool, the number of files that can be loaded into Bandage from MEGAHIT can only be up to number of K-mers used in the assembly.

For example:

Megahit (8):

danejo3 commented 1 year ago

Rabbit Hole - GFA is the future standard of assembly graph format - There are different 2 versions of GFA:

All assembly graph files, specifically GFA files, that can be used by Bandage are GFA 1.0. (See the list of softwares using either GFA 1.0 or 2.0 here.)

Here is a discussion on GFA 1.0 and 2.0: https://github.com/GFA-spec/GFA-spec/issues/49.

danejo3 commented 1 year ago

With all this information collected and compiled in this issue thread, here are my thoughts on how we can include "support" for bandage in YEAT.

Because Bandage is its' own software, I think we'll need to add in the documentation notifying users that they will need to either download the Bandage GUI or create the Bandage from the source code.

For us to "support" Bandage, I was thinking of creating a directory in analysis/bandage and creating separate directories within that directory for each assembly algorithm used. For example, analysis/bandage/spades.

The question that comes to my mind is, in the following posts above, should I symlink or copy all of the assembly graph files into these files? What assembly graph files are useful and which are not?

Finally and ultimately, is adding this analysis/bandage "support" necessary if users can go directly into the analysis/{assembler} themselves if Bandage is not included in YEAT?

danejo3 commented 1 year ago

In additional to the post above, if we do go about creating a analysis/bandage directory. Depending on how many assembly graph files we want to add to them, we will need to change the current Snakemake file.

danejo3 commented 1 year ago

Another question: MEAGHIT has a number of k-mers that it uses. If we were to select one specific k-mer, how would we know which one is the best? By default, and from examples that I'm seeing from other people, k-99 is what people are going for? Any reasons why this is the case?

standage commented 1 year ago

There's a lot here, and I won't be able to respond to it all immediately. Briefly, I envisioned Bandage "support" to involve generating some kind of default plot as part of the core workflow, or at least aggregating Bandage-compatible input files for all assemblers to the extent possible with reasonable effort.

A quick Google search led me to this thread and this page on the Bandage wiki, confirming that Bandage can indeed be scripted on the command line. That's very relevant to the discussion here.

But many of your other points are relevant and deserve discussion. I'll return to this thread as soon as I can.

danejo3 commented 1 year ago

I found a package that converts fasta files into gfa and vice-versa.

This will allow us to visualize the final config file created by MEGAHIT since its internal tool can only convert intermediate k-mer contig files. (See comment about megahit_toolkit.)

https://github.com/vgl-hub/gfastats#usage