eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
556 stars 105 forks source link

annotation table #304

Closed ucassee closed 3 years ago

ucassee commented 3 years ago

Hi, Is there a table for annotation that links protein id in eggNOG_v5.proteomes.faa to the detail description? Thanks

Cantalapiedra commented 3 years ago

Hi,

yes, within the sqlite database provided with eggnog-mapper.

Best, Carlos

ucassee commented 3 years ago

Hi @Cantalapiedra ,

Thanks for your reply. Where can I find the SQLite database? I wonder is it possible for me to get this database in csv or txt format?

Thanks

Cantalapiedra commented 3 years ago

Hi @ucassee ,

the sqlite database is downloaded with download_eggnog_data.py available in this repo. You would need to extract the data. Something like:

sqlite3 data/eggnog.db "select * from prots limit 2" | tr "|" "\t"

You may also check the files in the downloads section of the eggNOG database web http://eggnog5.embl.de/#/app/downloads

I hope this helps.

Best, Carlos

ucassee commented 3 years ago

Hi @Cantalapiedra ,

It helps a lot. I extracted the table I want, but it lost the column that contains detailed annotation (same as the last column in the following result of eggmapper).

Dive121-T1_NODE_78_20   1216007.AOPM01000080_gene1215   1.4e-35 156.0   Pseudoalteromonadaceae  radC    GO:0006139,GO:0006259,GO:0006281,GO:0006725,GO:0006807,GO:0006950,GO:0006974,GO:0008150,GO:0008152,GO:0009987,GO:0033554,GO:0034641,GO:0043170,GO:0044237,GO:0044238,GO:0044260,GO:0046483,GO:0050896,GO:0051716,GO:0071704,GO:0090304,GO:1901360       ko:K03630                   ko00000             Bacteria    1MXZ5@1224,1RP86@1236,2Q0N1@267888,COG2003@1,COG2003@2  NA|NA|NA    E   Belongs to the UPF0758 family
Dive121-T1_NODE_78_21   1396141.BATP01000029_gene2258   1.6e-18 100.1   Verrucomicrobiae                                                    Bacteria    2E0FE@1,2IVPU@203494,32W1M@2,46TEN@74201    NA|NA|NA    S   Domain of unknown function (DUF932)

Thanks, Yingli

Cantalapiedra commented 3 years ago

Hi @ucassee ,

the description returned by emapper is not from proteins but from the Orthologous Groups. It is using the "og" table, querying for "og.og" and "og.level".

Best, Carlos

Cantalapiedra commented 3 years ago

Closing this. Please, reopen or reissue if needed.