Closed idoerg closed 4 years ago
Also, it seems like the taxonomy file from NCBI cannot is not there? This seems to have cropped up in the past: https://github.com/DerrickWood/kraken/issues/132 (Seems to have not yet been fixed in the Conda version of kraken2)
$ kraken2-build --db /home/idoerg/work/oxymice/db2 --download-taxonomy
Downloading nucleotide est accession to taxon map...rsync: link_stat "/taxonomy/accession2taxid/nucl_est.accession2taxid.gz" (in pub) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [Receiver=3.1.2]
The file nucl_est.accession2taxid.gz
does not seem to be in /pub/taxonomy/accession2taxid/
Index of /pub/taxonomy/accession2taxid/
Name | Size | Date Modified
-- | -- | --
README | 3.0 kB | 8/9/20, 11:21:00 AM
dead_nucl.accession2taxid.gz | 167 MB | 8/9/20, 11:21:00 AM
dead_nucl.accession2taxid.gz.md5 | 63 B | 8/9/20, 11:21:00 AM
dead_prot.accession2taxid.gz | 685 MB | 8/9/20, 11:21:00 AM
dead_prot.accession2taxid.gz.md5 | 63 B | 8/9/20, 11:21:00 AM
dead_wgs.accession2taxid.gz | 471 MB | 8/9/20, 11:21:00 AM
dead_wgs.accession2taxid.gz.md5 | 62 B | 8/9/20, 11:21:00 AM
nucl_gb.accession2taxid.gz | 1.8 GB | 8/9/20, 11:22:00 AM
nucl_gb.accession2taxid.gz.md5 | 61 B | 8/9/20, 11:22:00 AM
nucl_wgs.accession2taxid.gz | 3.4 GB | 8/9/20, 11:23:00 AM
nucl_wgs.accession2taxid.gz.md5 | 62 B | 8/9/20, 11:23:00 AM
pdb.accession2taxid.gz | 3.4 MB | 8/9/20, 11:23:00 AM
pdb.accession2taxid.gz.md5 | 57 B | 8/9/20, 11:23:00 AM
prot.accession2taxid.gz | 5.8 GB | 8/9/20, 11:25:00 AM
prot.accession2taxid.gz.md5 | 58 B | 8/9/20, 11:25:00 AM
How do you know that the masking didnt complete correctly? Normally the .masked file isnt generated until the masking if finished (its supposed to be an empty file).
Otherwise, you can run ./mask_low_complexity.sh library/human/
to redo the masking.
regarding the missing taxonomy files, you only need the nucl_gb/nucl_wgs accession2taxid.map files from https://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/ (and just gunzip both)
Then download https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz and run tar zxf taxdump.tar.gz
Thanks!
The bacteria custom db is now built. Also, for the standard database I had to modify the code as per: https://github.com/DerrickWood/kraken/issues/114#issuecomment-610912961 (the "na" bug from 2 years ago) and also download the missing taxonomy files. Since neither of these fixes didn't make it to the Conda version of Kraken2, perhaps it would be a good idea to add a comment in the manual?
I also have the same issue when trying to download the viral database:
$ kraken2-build --download-library viral --db krakendb-viral --threads 10
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 10388 projects (13011 sequences, 387.10 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
And then if I try to use kraken2 directly:
$ kraken2 --db krakendb-viral --threads 20 --output taxonomy.txt assembly.fasta
kraken2: database ("./krakendb-viral") does not contain necessary file taxo.k2d
@idoerg The Kraken2 authors (including myself) are not the ones keeping the conda version up to date so we don't include any information about that in our manual. Hopefully we will have a stable version of Kraken 2 out that has the fix for 'na' and and then our extremely helpful friends that do keep the conda version up to date can fix that as well.
@Puumanamana your issue is more that you did not download the taxonomy as well. Simply run kraken2-build --download-taxonomy --db krakendb-viral
and then you can build the database using kraken2-build --build --db krakendb-viral --threads 10
before running kraken2 itself.
@jenniferlu717 : Thank you, I missed this part in the documentation !
I'm going to close this issue for now. If you continue to have problems, please open a new issue.
Hi,
I tried to build the
bacteria
database usingkraken2-build --threads 24 --download-library bacteria --db /work/idoerg/db/k2bac
Seems like the process ended midway without notifying of errors, but also without a completely built database.
Any ideas on how to continue the process / salvage the database without redoing everything? Seems like the masking failed.