DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
711 stars 271 forks source link

Protein database not working #478

Open caonetto opened 3 years ago

caonetto commented 3 years ago

Hi, Thanks for creating this tool. I have downloaded the nr protein database using the following commands, kraken2-build --download-taxonomy --DB nr kraken2-build --download-library nr --DB nr kraken2-build --build --protein --threads 50 --db nr/ No errors appeared when building the database however when I try to use it against a fungal reference genome I get no hits at all. I tried with other set of DNA sequences and also don't get any hits. Both sequences give results when run against a DNA database. Does the kraken2 activate by default the translate mode when using a DNA sequence against a protein database?

Thanks.

zyllifeworld commented 1 year ago

I think you should change the

kraken2-build --download-taxonomy --DB nr to kraken2-build --download-taxonomy --DB nr --protein

unless kraken2 does not download "prot.accession2taxid.gz" file (default), if you check kraken2-build log, you will find most of your sequence_accession_id didn't map to any taxid, this is why you got no hits. The document on making an nr database may be a little confusing.

or just put the prot.accession2taxid.gz (download from NCBI https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/) in DB_DIR/taxonomy folder and re-build the database.