DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 271 forks source link

kraken2-build doesn't follow symlinked library subdirectories #568

Open sconlan opened 2 years ago

sconlan commented 2 years ago

I was trying to be clever and created a directory structure like:

krakendb1 |--library

Inside krakendb1/library I symbolically linked to downloaded and processed bacteria, fungi, human, etc...

krakendb1 |--library |--bacteria -> /somepath/bacteria |--fungi -> /somepath/fungi

This is nice because you could build several different databases krakendb1, krakendb2... using the same base RefSeq download. I'm trying different combinations of additional genomes added. Unfortunatelty, the build script can't see the symbolically linked libraries. I think it is because of the find command called by build_kraken2_db doesn't traverse links: find library/ '(' -name '.fna' -o -name '.faa' ')' -print0

It works OK if you symbolically link to the entire library directory but that causes problems if you want to add genomes using kraken2-build --add-to-library

Adding -L to the find command could fix the issue but maybe it breaks things elsewhere?

jenniferlu717 commented 2 years ago

Can you submit a pull request for this update?

sconlan commented 2 years ago

I can give it a try, give me a couple weeks.