DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
735 stars 274 forks source link

Kraken2 build fails #824

Open sachinharle opened 7 months ago

sachinharle commented 7 months ago

I get the following error when building kraken2 database: rsync: link_stat "/all/GCF/037/914/965/GCF_037914965.1_ASM3791496v1/GCF_037914965.1_ASM3791496v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)

'GCF_037914965.1_ASM3791496v1_genomic.fna.gz', this exact file does not exist at : https://ftp.ncbi.nlm.nih.gov/genomes//all/GCF/037/914/965/GCF_037914965.1_ASM3791496v1/

Please help me out here

DeaconOfBiology commented 7 months ago

Im in the same boat. Im trying to trouble shoot now to see if I can come up with a fix as no one has responded yet. Of course, this looks like your request was put in on Friday, so maybe they'll get back to us today.

jenniferlu717 commented 7 months ago

Interesting, this unfortunately happens when NCBI includes a link in their files that does not connect to an actual file. Kraken2 just uses the data provided by NCBI to determine which filepaths to download. I do not have a solution except to suggest downloading the full standard pre-built database here: https://benlangmead.github.io/aws-indexes/k2

sachinharle commented 7 months ago

thank you for the reply. I used the prebuilt file as suggested from : [(https://benlangmead.github.io/aws-indexes/k2)]

When building barcken database with following command: racken-build -d /media/fgl/Data/Databases/kraken2/k2_standard_20240112 -t 96 -k 35 -l 76

it gives error ERROR: Database taxonomy /media/fgl/Data/Databases/kraken2/k2_standard_20240112/taxonomy/nodes.dmp does not exist

where can I get or generate nodes.dmp file?

sachinharle commented 7 months ago

I found the solution for my question : get or generate nodes.dmp file? I renamed ktaxonomy.tsv file available with prebuilt database k2_standard_20240112 to nodes.dmp and put it in taxonomy folder and it worked. thanks again.

rbtoscan commented 4 months ago

Hi @sachinharle,

could you please share what you have inside your database folder? I tried to do it like you did, but I am having issues:

here is mine: database100mers.kmer_distrib database150mers.kmer_distrib database200mers.kmer_distrib database250mers.kmer_distrib database300mers.kmer_distrib database50mers.kmer_distrib database75mers.kmer_distrib hash.k2d inspect.txt k2_standard_08gb_20240605.tar.gz library_report.tsv opts.k2d seqid2taxid.map standard08gb.md5 taxo.k2d taxonomy unmapped_accessions.txt

and inside taxonomy

nodes.dmp

but when I run kraken build, i get the following:

./kraken2-build --build --db db_prebuilt/ -t 6 Can't find library/ subdirectory in database directory, exiting.

Thank you very much!

Best Rodolfo

sachinharle commented 4 months ago

Hi @rbtoscan, I'm no expert. But my folder is structured as follows: k2_standard_20240112 ├── database100mers.kmer_distrib ├── database150mers.kmer_distrib ├── database200mers.kmer_distrib ├── database250mers.kmer_distrib ├── database300mers.kmer_distrib ├── database50mers.kmer_distrib ├── database75mers.kmer_distrib ├── database76mers.kmer_distrib ├── database76mers.kraken ├── database.kraken ├── hash.k2d ├── inspect.txt ├── ktaxonomy.tsv ├── library │   ├── ktaxonomy.tsv │   └── library_report.tsv ├── library_report.tsv ├── opts.k2d ├── seqid2taxid.map ├── taxo.k2d ├── taxonomy │   ├── ktaxonomy.tsv │   └── nodes.dmp └── unmapped_accessions.txt

Try to replicate the same