KThorellGroup / BACTpipe

BACTpipe: An assembly and annotation pipeline for bacterial genomics
https://bactpipe.readthedocs.org
MIT License
20 stars 8 forks source link

Update docker file to accomodate kraken #165

Closed abhi18av closed 3 years ago

abhi18av commented 3 years ago

Hi @boulund ,

I've fixed the docker profile and done some basic housekeeping. I'd say that https://github.com/ctmrbio/BACTpipe/issues/47 can be closed with this PR.

One thing I need to understand, how to I build the database or download it. In either case, should we include a shell script/ nf-process for the user to build/download it and then pass it onwards to the classify_taxonomy process?

boulund commented 3 years ago

The quickest/easiest way to get a kraken2 database that works for BACTpipe is to just download the official "minikraken" database from their official page: http://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads

Direct link here: ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz

It's about 5.5 GB compressed and decompresses to just under 8 GB.

I'm not sure it is worth the extra work of writing (and then having to maintain) code to automatically download the database. Kraken2 is such a common tool that almost everyone already has a copy of the database lying around, and if not, it's really straightforward to just download and extract it yourself.

Something we definitely should do is to add a section about the kraken2 database being an optional dependency in the installation instructions and point to the official kraken2 download page so it's easy to find. I feel it's only barely mentioned in the docs right now, and kind of buried in all the details on the "Running BACTpipe" page.

abhi18av commented 3 years ago

I'm not sure it is worth the extra work of writing (and then having to maintain) code to automatically download the database. Kraken2 is such a common tool that almost everyone already has a copy of the database lying around, and if not, it's really straightforward to just download and extract it yourself.

Yeah, I agree. From the cloud based usage perspective, I think since the overall DB size is small it's simpler to upload it to the bucket and then point via the configs. So, yeah should work fine 👍

One question though - shall we add a bash script resources/download_kraken_db.sh with the following content


set -uex

wget -r 'ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz'

echo "Kraken database download!"

For the docs, shall I take it forward to touch up the kraken related information?

thorellk commented 3 years ago

II think it's fine to add a script like you say @abhi18av. And yes, go ahead and update the docs if you have time :)

abhi18av commented 3 years ago
abhi18av commented 3 years ago

I've updated the docs now. Unless there are change requests, this PR can be merged 👍

boulund commented 3 years ago

Looks good to me!