bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
390 stars 191 forks source link

Problem with building KRAKEN database: No such file ‘nucl_est.accession2taxid.gz’. #158

Open slambrechts opened 5 years ago

slambrechts commented 5 years ago

Hi Ursky,

I would like to point out that it currently seems impossible to build the kraken database. It seems the accession2taxid files (for example nucl_est.accession2taxid.gz) are not on the ncbi ftp server anymore, or are moved to a different location, as of a few days ago. See: Index of /pub/taxonomy/accession2taxid

So when I run:kraken-build --download-taxonomy --threads 4 --db /mnt/e/Sam/KRAKEN/DATABASE/

I get:

--2019-04-12 17:32:37--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_est.accession2taxid.gz
           => ‘nucl_est.accession2taxid.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/taxonomy/accession2taxid ... done.
==> SIZE nucl_est.accession2taxid.gz ... done.

==> PASV ... done.    ==> RETR nucl_est.accession2taxid.gz ...
No such file ‘nucl_est.accession2taxid.gz’.

Putting this out here in case somebody else experiences the same problem

Feel free to remove this issue if you think this doesn't belong here.

Or if you think there might be a solution, feel free to let us know :)

Toliman06 commented 5 years ago

I have the same problem here...

(metawrap-env) bababaal@MEPHISTO:/DATA/metaWRAP_DATABASE/KRAKEN$ kraken-build --standard --threads 7 --db KRAKEN_DATABASE_2019-04-15 --work-on-disk
Found jellyfish v1.1.12
--2019-04-15 16:12:05--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_est.accession2taxid.gz
           => «nucl_est.accession2taxid.gz»
Résolution de ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)… 130.14.250.11, 2607:f220:41e:250::7
Connexion à ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.11|:21… connecté.
Ouverture de session en tant que anonymous… Session établie.
==> SYST ... terminé.    ==> PWD ... terminé.
==> TYPE I ... terminé.  ==> CWD (1) /pub/taxonomy/accession2taxid ... terminé.
==> SIZE nucl_est.accession2taxid.gz ... terminé.

==> PASV ... terminé.    ==> RETR nucl_est.accession2taxid.gz ... 
Fichier «nucl_est.accession2taxid.gz» inexistant.
Toliman06 commented 5 years ago

I know where this come from: "The Nucleotide database will include EST and GSS sequences in early 2019." https://ncbiinsights.ncbi.nlm.nih.gov/2018/07/30/upcoming-changes-est-gss-databases/

NCBI guys required the files needed by KRAKEN...

slambrechts commented 5 years ago

I also found this information: https://github.com/DerrickWood/kraken2/issues/101

ursky commented 5 years ago

Looks like the files were changed and the automatic Kraken DB pull no longer works. Sounds like a miscommunication between NCBI and Kraken. See issue https://github.com/DerrickWood/kraken/issues/132. Looks like they are working on fixing this. This cannot be fixed in metaWRAP, so the only thing I can do is wait for a fix from the Kraken team. My advice would be to try out Kraken2 in the meantime...

ursky commented 5 years ago

Actually, looks like kraken2 is suffering from the same issue...

jenniferlu717 commented 5 years ago

Kraken and Kraken2's download scripts should work now. Let me know if you run into anymore trouble!.

ursky commented 5 years ago

Thanks Jen!

kunstner commented 5 years ago

Hi, I've ask the NCBI helpdesk about this issue. Here is their reply:

The EST and GSS sequences have been subsumed into our Nucleotide (GenBank in this case) database; please see the note here: https://www.ncbi.nlm.nih.gov/nuccore/ You should get the nucl_gb* files to have the EST and GSS records.

Best, Axel

ursky commented 5 years ago

Hey @jenniferlu717, I have some users asking what they should do to get this to work? Updating Kraken doesn`t seem to work.

jenniferlu717 commented 5 years ago

It needs to be updated and reinstalled (rerun sh install_kraken)

ursky commented 5 years ago

I see. Any chance you could update the bioconda recipe? That is the version all these people use.

jenniferlu717 commented 5 years ago

I actually am not the one that is in charge of the bioconda recipe but i will contact the person that can make that change.

ursky commented 5 years ago

Thank you!

olneykimberly commented 3 years ago

Hi, this is still an issue. I have tried with both conda install, and installing using the git repo. kraken-build --standard --threads 24 --db std_kraken _db Found jellyfish v1.1.12 --2021-08-03 12:53:38-- ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_est.accession2taxid.gz => 'nucl_est.accession2taxid.gz' Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.230, 130.14.250.7, 2607:f220:41f:250::230, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.230|:21... failed: Connection timed out. Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.7|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/taxonomy/accession2taxid ... done. ==> SIZE nucl_est.accession2taxid.gz ... done. ==> PASV ... done. ==> RETR nucl_est.accession2taxid.gz ... No such file 'nucl_est.accession2taxid.gz'.

danielle-dzd commented 2 years ago

Hi, I am also trying to update a Kraken database and ran kraken-build --download-taxonomy --db $DBNAME and got a similar error.

--2022-08-03 15:28:00-- ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_est.accession2taxid.gz => ‘nucl_est.accession2taxid.gz’ Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.12, 130.14.250.11, 2607:f220:41e:250::12, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.12|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/taxonomy/accession2taxid ... done. ==> SIZE nucl_est.accession2taxid.gz ... done.

==> PASV ... done. ==> RETR nucl_est.accession2taxid.gz ... No such file ‘nucl_est.accession2taxid.gz’.