DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
709 stars 271 forks source link

custom db only from own fasta #371

Open aspitaleri opened 3 years ago

aspitaleri commented 3 years ago

Hi just to make sure that's proper way. I am building a kraken2 custom db using only my own fasta:

  1. kraken2-build --add-to-library file1.fasta --db test
  2. kraken2-build --add-to-library file2.fasta --db test and so on. then
  3. kraken2-build --download-taxonomy --db test
  4. kraken2-build --build --db test

is that okay? the step 3 take a while since it is downloading all taxonomy species. Is there any way to make quicker it, since I have just 10 fasta to generate my own db. Best

mihkelvaher commented 3 years ago

You could copy the taxonomy dir from an existing to the new db dir.

aspitaleri commented 3 years ago

Yes what I did! Just wondering how I can make sure that taxonomy dir is update version. Do I need to download every time or I can just update it? Best

mihkelvaher commented 3 years ago

Since the taxonomy files are usually passed around as compressed taxdumps (ftp://ftp.ncbi.nih.gov/pub/taxonomy/), I can't see a good way to say if the individual files need to be updated.

If you want to determine if anything has changed in the whole you could: 1) Keep the downloaded taxdump.zip (or any other format) file after decompressing. 2) Also download the corresponding md5 hash (taxdump.zip.md5). 3) If you want to check if anything has changed, just download the latest md5 hash and compare it to the previous one. If they differ, something's been changed.

Haven't tried it myself, but in principle, it should work. It could probably be added to Kraken but as you're not expected to build a database every day, I think the implementation effort could be spent elsewhere.

Or you could just look when was the file last modified on the ftp://ftp.ncbi.nih.gov/pub/taxonomy/ page :D

Also, the need to keep the taxonomy updated depends on your species. If their taxonomy is well determined and won't change in the near future, you might be making things too complex when the taxonomy is changed between some far away species you don't even care about.

Mihkel

aspitaleri commented 3 years ago

Thanks for the advice!