Closed abhi18av closed 3 years ago
The quickest/easiest way to get a kraken2 database that works for BACTpipe is to just download the official "minikraken" database from their official page: http://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads
Direct link here: ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz
It's about 5.5 GB compressed and decompresses to just under 8 GB.
I'm not sure it is worth the extra work of writing (and then having to maintain) code to automatically download the database. Kraken2 is such a common tool that almost everyone already has a copy of the database lying around, and if not, it's really straightforward to just download and extract it yourself.
Something we definitely should do is to add a section about the kraken2 database being an optional dependency in the installation instructions and point to the official kraken2 download page so it's easy to find. I feel it's only barely mentioned in the docs right now, and kind of buried in all the details on the "Running BACTpipe" page.
I'm not sure it is worth the extra work of writing (and then having to maintain) code to automatically download the database. Kraken2 is such a common tool that almost everyone already has a copy of the database lying around, and if not, it's really straightforward to just download and extract it yourself.
Yeah, I agree. From the cloud based usage perspective, I think since the overall DB size is small it's simpler to upload it to the bucket and then point via the configs. So, yeah should work fine 👍
One question though - shall we add a bash script resources/download_kraken_db.sh
with the following content
set -uex
wget -r 'ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz'
echo "Kraken database download!"
For the docs, shall I take it forward to touch up the kraken
related information?
II think it's fine to add a script like you say @abhi18av. And yes, go ahead and update the docs if you have time :)
kraken db
script I've updated the docs now. Unless there are change requests, this PR can be merged 👍
Looks good to me!
Hi @boulund ,
I've fixed the docker profile and done some basic housekeeping. I'd say that https://github.com/ctmrbio/BACTpipe/issues/47 can be closed with this PR.
One thing I need to understand, how to I build the database or download it. In either case, should we include a shell script/ nf-process for the user to build/download it and then pass it onwards to the
classify_taxonomy
process?