@abhi18av @emilio-r @thorellk
I spoke to Kaisa earlier today about our frustrations with SendSketch being so unreliable in the pipeline, so we decided we should skip it entirely. It brings more issues than it solves at the moment.
Looking around what other bacterial whole genome assembly pipelines use for taxonomic classification it seems Kraken2 is the most common. Reviewing our options I feel we have the following choices:
Try to implement workarounds to make SendSketch work the way we want it to
Replace SendSketch with another tool, e.g.
Kraken2
mash screen
Summarizing the pros and cons I can see off the top of my head:
SendSketch
Pros:
It's really fast
Requires no local database
Cons:
It currently doesn't work for us
Kraken 2
Pros:
It's fast
It is also widely used for shotgun metagenomics, so people are likely to have databases lying around.
Cons:
The MiniKraken2_v2_8GB database is a 5.5 GB download, and requires at least 8 GB RAM to run (should still be doable on a laptop though)
Mash screen
Pros:
We have used it before, so we know it works for our intended purpose
Cons:
It requires a local database that is a 700MB+ download
It's significantly slower than other alternatives+
@thorellk and I now think that the best course of action ahead would be to replace SendSketch with Kraken2 and make the database a user-specified parameter at runtime. If the user does not specify a Kraken2 database, that step is skipped and the prokka step proceeds without genus information.
I'm planning to spend some time to implement Kraken2 as a replacement for SendSketch on Friday morning, and I will base my work on your latest branch @abhi18av , abhinav/check_signalp, so it's easy to merge everything back to develop without conflicts later.
@abhi18av @emilio-r @thorellk I spoke to Kaisa earlier today about our frustrations with SendSketch being so unreliable in the pipeline, so we decided we should skip it entirely. It brings more issues than it solves at the moment.
Looking around what other bacterial whole genome assembly pipelines use for taxonomic classification it seems Kraken2 is the most common. Reviewing our options I feel we have the following choices:
Summarizing the pros and cons I can see off the top of my head:
SendSketch
Kraken 2
Mash screen
@thorellk and I now think that the best course of action ahead would be to replace SendSketch with Kraken2 and make the database a user-specified parameter at runtime. If the user does not specify a Kraken2 database, that step is skipped and the prokka step proceeds without genus information.
I'm planning to spend some time to implement Kraken2 as a replacement for SendSketch on Friday morning, and I will base my work on your latest branch @abhi18av ,
abhinav/check_signalp
, so it's easy to merge everything back todevelop
without conflicts later.