fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
217 stars 43 forks source link

kraken build error #103

Open jsgounot opened 2 years ago

jsgounot commented 2 years ago

Hi,

I'm trying to build a custom krakenuniq database based on your manual, but without success so far:

Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmpCould not find parent with taxonomy ID 819 for taxonomy ID 820
Could not find parent with taxonomy ID 666 for taxonomy ID 956
Could not find parent with taxonomy ID 548 for taxonomy ID 1894
Could not find parent with taxonomy ID 375 for taxonomy ID 2244
...
Could not find parent with taxonomy ID 71 for taxonomy ID 72
Could not find parent with taxonomy ID 161 for taxonomy ID 162
Could not find parent with taxonomy ID 251 for taxonomy ID 440
Entry for 2 does not exist - it should!
Entry for 18 does not exist - it should!
Entry for 21 does not exist - it should!
...
Entry for 6361 does not exist - it should!
Entry for 6362 does not exist - it should!
Entry for 6363 does not exist - it should!
. Done, got 44 taxa
taxDB construction finished. [0.006s]
Building  KrakenUniq LCA database (step 6 of 6)...
 Adding taxonomy IDs for sequences
 Adding taxonomy IDs for genomes
Reading taxonomy index from taxDBERROR: the parent of 956 is itself. Should not happend for taxa other than the root.
cat: write error: Broken pipe

This is a bit puzzling to me, I'm not sure to understand what Entry for x does not exist means. Moreover, I don't see why krakenuniq is unable to find parent which are indicated in the nodes file:

$ grep 666 nodes.dmp
666 |   161 |   family
956 |   666 |   genus
6357    |   666 |   genus

Is it because the nodes file is truncated?

Thank you very much!

alekseyzimin commented 2 years ago

Hi, make sure you comply with formatting rules for all input files, it could be tricky. You can build a standard database and then look at the files that are downloaded from refseq.