DerrickWood / kraken

Kraken taxonomic sequence classification system
http://ccb.jhu.edu/software/kraken/
GNU General Public License v3.0
214 stars 103 forks source link

building custom database fail at step 5 #107

Open dpchris opened 6 years ago

dpchris commented 6 years ago

Hi,

I have already successfully build custom databases but for some reason, I got an error at step 5 with the following command : kraken-build --threads 40 --build --db ~/kraken_customdb

Kraken build set to minimize disk writes. Creating k-mer set (step 1 of 6)... Found jellyfish v1.1.11 Hash size not specified, using '18503360000' K-mer set created. [1h4m34.027s] Skipping step 2, no database reduction requested. Sorting k-mer set (step 3 of 6)... K-mer set sorted. [2h1m17.745s] Skipping step 4, GI number to seqID map now obsolete. Creating seqID to taxID map (step 5 of 6)... No preliminary seqid/taxid mapping files found, aborting.

I checked and the file "prelim_map.txt" is in the taxonomy directory with "names.dmp" and "nodes.dmp" but kraken does not seem to find it.

Do you have any ideas of why ?

Thanks in advance.

Best regards.

jenniferlu717 commented 6 years ago

How did you add your files to the library? Unfortunately you cannot just place the files directly into the library folder, you should use the add-to-library option. I believe if you just do this step, it will create the files needed to make the seqid2taxid file and you won't have to redo the previous build steps

dpchris commented 6 years ago

I added my files using add-to-library option :

for dir in fungi protozoa archaea viral bacteria; do for fna in ls $dir/*.fna; do kraken-build --add-to-library $fna --db ../kraken_customdb_28-11-17_modif_kurstaki done done

and I downloaded the taxonomy with kraken-build --download-taxonomy

The headers of my fasta files looks like this : ">NC_021248.1|kraken:taxid|10288 Choristoneura biennis entomopoxvirus 'L', complete genome"

I successfully build my custom database with kraken v0.10.6 but I failed at step 5 with kraken v1.0 on the same database.

Maybe I used add-to-library with kraken 0.10.6 and try to build the database with kraken v.1.0, is that a problem ?


De : jenniferlu717 [notifications@github.com] Envoyé : mercredi 10 janvier 2018 19:05 À : DerrickWood/kraken Cc : CHRISTIANY David; Author Objet : Re: [DerrickWood/kraken] building custom database fail at step 5 (#107)

How did you add your files to the library? Unfortunately you cannot just place the files directly into the library folder, you should use the add-to-library option. I believe if you just do this step, it will create the files needed to make the seqid2taxid file and you won't have to redo the previous build steps

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/DerrickWood/kraken/issues/107#issuecomment-356686224, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASiDqRM1gZux0PD3IvW1Teu0_-UbCou1ks5tJPvrgaJpZM4RY_GA.

jenniferlu717 commented 6 years ago

Yes I think so. Part of the fix to remove the dependence on gi numbers in kraken v1.0 requires that another file is generated when using the --add-to-library fix. If you redo this with the newest kraken version, it should fix the problem?

jmonroynieto commented 4 years ago

I had the same error message. I am leaving a note for any other users that stumble across this (as did I).

...
Skipping step 4, GI number to seqID map now obsolete.
Creating seqID to taxID map (step 5 of 6)...
No preliminary seqid/taxid mapping files found, aborting.
...

I am building a custom database with a custom phylogenetic taxonomy with only 26 nodes for a single species. My error was very simple: I was creating the preliminary map on the taxonomy directory and it must be in the library directory.

rachelmugge commented 2 years ago

The above solution worked for me!