Closed mihkelvaher closed 3 years ago
Turns out I wasn't following the guide line by line. The classification and build are successful if the intermediate build step is skipped, meaning that first add all seqs of interest and only then build.
This leaves the question: could the existing database be updated over time AFTER the --build
command has been run?
Edit: Got my answer from https://github.com/DerrickWood/kraken2/issues/221#issuecomment-644279600 - the database needs to be built from scratch. It would be nice to have this confirmed by the devs before closing the issue. Also, I think it's a comment worth adding to the manual.
The database does have to be rebuilt from scratch. Removing any *.k2d files and then rebuilding will work.
Also, I believe the read sequence file has to be the LAST specified argument in your line
(i.e. when running kraken2, specify --report myreport.txt before specifying testadd1.fa)
Thanks!
After some testing, I found out that in addition to *.k2d
files, seqid2taxid.map
also needs to be removed in order to add new seqs to the db.
As I can see, rsync is used for downloading sequences from NCBI? Does this imply that the unbuilt database could be updated (without downloading all again) with new NCBI sequences by rerunning the command
./kraken2-build --download-library viral --db db_test/
?
Also, it seems that the reads file can be anywhere as an argument. It's probably assumed that 'unflagged == reads file'.
I believe you have to redownload all of the sequences. The way that library command works is that all of the sequences are put into the same library.fna file.
If you know what the new sequences are, you can download those separate and add to the database with kraken2-build --add-to-library $file
as long as the sequence maps are in the taxonomy/ folder.
Thanks!
Following this guide: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#custom-databases As a test, I'm trying out sequence adding to an existing database using viral db as a base and later adding two plant contigs (~750nt and ~11k nt). After the addition, the same sequences cannot be classified.
Download taxonomy and create a small viral database:
Check if there are any matches before adding:
Add the seqs
Try to classify the seqs again
Still no match. Am I doing something wrong?
Kraken version 2.1.0