Closed depancao closed 4 years ago
I have had success running the --download-library XYZ commands first to build up the library directories. After that, I just run something like: kraken2-build --build --db ./ --threads 24
I see no reason why you couldn't remove the database files from an already built database:
hash.k2d opts.k2d seqid2taxid.map taxo.k2d
Then use the "kraken2-build --add-to-library" command to add your updated genomes and then rerun the build command. You'd still have to wait for the build to rerun but you wouldn't have to wait for the downloads.
I have had success running the --download-library XYZ commands first to build up the library directories. After that, I just run something like: kraken2-build --build --db ./ --threads 24
I see no reason why you couldn't remove the database files from an already built database:
hash.k2d opts.k2d seqid2taxid.map taxo.k2d
Then use the "kraken2-build --add-to-library" command to add your updated genomes and then rerun the build command. You'd still have to wait for the build to rerun but you wouldn't have to wait for the downloads.
I'd like to use all refseq bacteria genomes, which is 180990 fna.gz files. Building such big database also takes lots of computing resources.
What I want to do is to add some more human polymorphism to the maxikraken DB, to overcome the disadvantage of mis-classifying of human reads to Mycobacterium Tuberculosis. As I know, kraken2 DB do not save original seqs, but save de-redundant k-mers and their LCA origin. So, I want to know, is it possible to simply add some more seqs or k-mers to a well-built kraken2 DB, without build it from seqs?
Unfortunately, the answer is no, you cannot simply add sequences/kmers to an already-built database. You will need to rebuild the database.
The reason for rebuilding is because of how the kmers are saved in memory. If your new database have kmers that belong inbetween existing ones, it cannot simply shift the kmers to a new memory space.
Can I add some new seqs into an already-built database, without do it from scratch? I see https://github.com/DerrickWood/kraken2/issues/45 says no to this question. But I want to know if I can inspect an already-built database, add some new seqs, then re-build it? Downloading every genomes properly for giant DB like maxikraken2 is impossible for me.