bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Adding Unicycler to the workflow #14

Closed danejo3 closed 1 year ago

danejo3 commented 2 years ago

The purpose of this PR is integrate Unicycler.

Unicycler is known for its hybrid assembly pipeline for bacterial genomes. When dealing with short reads, unicycler will use spades and optimize it to:

This PR will resolve #11 .

danejo3 commented 2 years ago

As of right now, YEAT can only process illumina paired-end reads. In the future, I plan on adding additional support for unpaired reads and long reads.

Unicycler has support for all kinds of reads--paired-end, unpaired, and long. One thing that is unique to unicycler is its ability to do a hybrid assembly with both short and long reads. According to the unicycler documentation, the hybrid assembly will be the most accurate (which makes sense).

unicycler -1 {read1} -2 {read2} -l {long_read}

In this PR, I have provided support for unicycler but only paired-end reads. In another PR, we'll need to add the support for all kinds of reads. In consequence, we may need to refactor a large part of the code base to enable the other read types.

danejo3 commented 2 years ago

Another thing that is unique with unicycler is that it creates its own Bandage file called assembly.gbk.

YEAT does not support bandage yet.

It looks like spades creates a few file(s) for bandage. -assembly_graph.fastg -assembly_graph_after_simplification.gfa -assembly_graph_with_scaffolds.gfa

megahit -> megahit_toolkit -> contig2fastg https://github.com/voutcn/megahit/wiki/Visualizing-MEGAHIT's-contig-graph

danejo3 commented 2 years ago

Unicycler has some interesting options: --min_fasta_length (default 100) --mode (conservative/normal/bold)

image