Closed AlessioMilanese closed 5 years ago
I agree that a documentation for Krakenuniq would be very helpful. Many users never used the previous versions. I am also having trouble building the database. I just did: krakenuniq-download --db DB --taxa "archaea,bacteria,viral,fungi,protozoa,helminths" --dust --exclude-environmental-taxa nt
And then tried to build the database with krakenuniq-build --standard --threads 80 --download-taxonomy --db DB
My computer run for over a day, but only produced a database.jdb file.
Dear @AlessioMilanese and @gexijin , thank you for your feedback. I am now working on updating the README and MANUAL - I hope it will be easier to follow very soon!
@gexijin , the database building takes quite a while, especially on the nt database
@AlessioMilanese , to answer your original question, the parameters in question have been renamed to --taxids-for-genomes
and --taxids-for-sequences
. Let me know if you have further questions.
Alessio, It will be great if you can even build a database for users to download. We can now download 100Gb files easily. And there are public repositories like Zenodo (50GB per file) and Figureshare (20GB per file). Thanks.
Hi @gexijin, From NCBI on the 23rd of November 2018 the size of the database is 179Gb (you will need the same amount of RAM to run it). Note that I am not a developer of KrakenUniq and I would leave the upload to a public repository to one of the developers. (It is possible to load 200Gb file to Zenodo asking to increase the disk quota)
Updated the README. If there are more questions or problems with the build, classification, or feature request, please feel free to open another issue with a description of the problem.
We may provide databases on a regular basis in the future. For now, you may download the three databases described in the manuscript at ftp://ftp.ccb.jhu.edu/pub/software/krakenuniq/Databases/
Hi Florian, The updated README is very helpful! Thank you. Can you double check if this line of code downloads both human genome and UniVec and EmVec? It looks like only human genome is downloaded.
Contaminant sequences from UniVec and EmVec, plus the human reference genome krakenuniq-download --db DBDIR refseq/vertegrate_mammalian/Chromosome/species_taxid=9606
Also, for the identification of microbes from human RNA-seq data, I found it very helpful to include human transcripts sequences. Is there any easy way to do that?
Hi,
I installed the latest version of KrakenUniq (on 22th November, 80ac242). In the README.md, line 42 I see:
But there is no
--generate-taxonomy-ids-for-sequences
option inkrakenuniq-build
.Also, instead (or in addition) of describing the differences with kraken, it would be really helpful to describe the steps of going from installation to profiling of a sample. For example: