DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
699 stars 270 forks source link

How to use the EuPathDB files provided #622

Closed josruirod closed 1 year ago

josruirod commented 2 years ago

Hi, so thank you so much for the work and for providing pre-built indexes and files here. So I would be particularly interested in using the EuPathDB 48 files provided here. However, I want to combine with others, such as standard or bacteria, so I have to rebuild the database. The same webpage provides eupathdb 28, and instructions for using it. Just untar and copy to the library folder, and kraken2-build.

However, for version 48, with instructions "See below for how to extract tar-gzipped multi-FASTA files.", I'm not sure on how to proceed. You can download multifasta files from the ftp, but unlike version 28, the file prelim_map is not available, but a seqid2taxid.map. How to proceed so the database can be build with others? Do we have to manually copy that to the taxonomy folder?

Thanks for any comment or suggestion

jenniferlu717 commented 1 year ago

Combining the eupathDB 48 files with a standard database can be a little more complicated. If you already have the database seqid2taxid (without the eupath files), you can concatenate that with the eupath_seqid2taxid.map file and add the files to the library/ folder and rebuild.

I'll have to look into doing it without already have the seqid2taxid.map file from the other files.

josruirod commented 1 year ago

Thanks for the comment! Best

morganpuff commented 1 year ago

Hi Jennifer,

I would like to only use a subset of the EuPath files + add in some other taxa creating a new database, however the filepath is no longer working. Have these files moved somewhere else?

ftp://ftp.ccb.jhu.edu/pub/EuPathDB46/AmoebaDB46.tgz for example doesn't exist or isn't showing up for me.

morganpuff commented 1 year ago

I ended up answering my own question, but the new path is ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/AmoebaDB46.tgz and etc for the other files.

ZhangMH2000 commented 1 month ago

I ended up answering my own question, but the new path is ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/AmoebaDB46.tgz and etc for the other files.我最终回答了我自己的问题,但其他文件的新路径是 ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/AmoebaDB46.tgz 等。

Hi morganpuff, how did you find the new path? The path you found is no longer working, and I couldn't find them.

morganpuff commented 1 month ago

Oh wow, I can't find them either - it looks like they are no longer hosted anymore. It looks like you can only access the pre-built databases through Amazon AWS. Maybe @jenniferlu717 can direct to where the individual taxa are?

ZhangMH2000 commented 1 month ago

Oh wow, I can't find them either - it looks like they are no longer hosted anymore. It looks like you can only access the pre-built databases through Amazon AWS. Maybe @jenniferlu717 can direct to where the individual taxa are?

Thank you. I found that Eupathdb is continuously updating And it has updated to version 68. see: https://veupathdb.org/veupathdb/app/downloads I thought maybe I can directly download all the genomes separately and then merged them.

jenniferlu717 commented 1 month ago

Looking into this now

jenniferlu717 commented 1 month ago

How are you trying to access these files: ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/AmoebaDB46.tgz They work through wget

christopherwilliamlee commented 1 month ago

Hello everyone,

I think the https://ccb.jhu.edu/data/eupathDB/ link is missing some information.

In the section "To build a database containing these genomes:" Step 4: "Download the seqid2taxid.map file for EuPathDB46: wget ftp://ftp.ccb.jhu.edu/pub/EuPathDB46/seqid2taxid.map" is missing a directory.

Note that the link "ftp://ftp.ccb.jhu.edu/pub/EuPathDB46/seqid2taxid.map" should actually be "ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/seqid2taxid.map."

The same applies to "wget ftp://ftp.ccb.jhu.edu/pub/EuPathDB46/AmoebaDB46.tgz." The correct link for AmoebaDB46 is "ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/AmoebaDB46.tgz."

image

jenniferlu717 commented 4 weeks ago

I fixed it. Thank you @christopherwilliamlee