Replace SendSketch with Kraken2

boulund commented 3 years ago

@abhi18av @emilio-r @thorellk I spoke to Kaisa earlier today about our frustrations with SendSketch being so unreliable in the pipeline, so we decided we should skip it entirely. It brings more issues than it solves at the moment.

Looking around what other bacterial whole genome assembly pipelines use for taxonomic classification it seems Kraken2 is the most common. Reviewing our options I feel we have the following choices:

Try to implement workarounds to make SendSketch work the way we want it to
Replace SendSketch with another tool, e.g.
- Kraken2
- mash screen

Summarizing the pros and cons I can see off the top of my head:

SendSketch
- Pros:
- It's really fast
- Requires no local database
- Cons:
- It currently doesn't work for us
Kraken 2
- Pros:
- It's fast
- It is also widely used for shotgun metagenomics, so people are likely to have databases lying around.
- Cons:
- The MiniKraken2_v2_8GB database is a 5.5 GB download, and requires at least 8 GB RAM to run (should still be doable on a laptop though)
Mash screen
- Pros:
- We have used it before, so we know it works for our intended purpose
- Cons:
- It requires a local database that is a 700MB+ download
- It's significantly slower than other alternatives+

@thorellk and I now think that the best course of action ahead would be to replace SendSketch with Kraken2 and make the database a user-specified parameter at runtime. If the user does not specify a Kraken2 database, that step is skipped and the prokka step proceeds without genus information.

I'm planning to spend some time to implement Kraken2 as a replacement for SendSketch on Friday morning, and I will base my work on your latest branch @abhi18av , abhinav/check_signalp, so it's easy to merge everything back to develop without conflicts later.

abhi18av commented 3 years ago

Thanks for the update @boulund 👍

I am happy that we have decided to move beyond sendsketch since the it doesn't really make the workflow reproducible at all.

boulund commented 3 years ago

Implemented and tested in the dev branch

KThorellGroup / BACTpipe

Replace SendSketch with Kraken2 #162