KennthShang / PhaBOX

Local version of the phage identification and analysis web server (tool set)
https://phage.ee.cityu.edu.hk/
Academic Free License v3.0
32 stars 2 forks source link

How to get the full taxa of predicted hosts #13

Open WUD2018 opened 6 months ago

WUD2018 commented 6 months ago

Hey developer,

I noted that the Cherry gives hosts of phage sequences. However, it only lists the species names. How can I get the full taxa? Is there any taxa list?

Thanks.

PS: there are two files (virus.csv & prokaryote.csv) in the './database/cherry' file. Which one should I use?

KennthShang commented 6 months ago

Hi,

Thanks for using our tools. the full taxa can be found in the file "prokaryote.csv"

You can also search for it with the ETE3 toolkit: http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html

Best, Jiayu

WUD2018 commented 6 months ago

Thanks Jiayu,

One more suggestion here: you may note that the gtdbtk has updated its database, which has a new taxa name system. Is it possible for you to update phabox and assign the phage-host names in alignment with gtdbtk (such as v2.3 release 214)

KennthShang commented 6 months ago

Good to know.

We will consider to update the gtdbtk in the near future. (I am sorry that we have to catch up ddl recently

If you are in a hurry, you can convert it by yourself. We provided a script to convert the current results from NCBI taxa to GTDB (in the GTDB folder). If there is a table to align GTDB to gtdbtk, it can be done easily.

Or maybe if you want to share this table with us, that will be helpful to write a script for that.

Best, Jiayu

WUD2018 commented 6 months ago

Thanks, Jiayu,

Here is the taxa table (v2.3.0 release 214):

gtdb_taxonomy.txt

KennthShang commented 6 months ago

Hi there,

I am sorry, but it seems the provided taxa table cannot be used to convert RefSeq into GTDB. I found that many of the sequences in the RefSeq taxa cannot find their corresponding taxa in the provided file (missing many accession maps).

However, I suddenly found that my provided scripts in the GTDB folder had some problems in the previous release. I have fixed the problems with a readme file. Hope it can help you to convert the RefSeq into the wanted GTDB. The GTDB taxa are also downloaded from the official website.

Best, Jiayu