eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
571 stars 105 forks source link

Missing GO terms #459

Open snehakhokhali opened 1 year ago

snehakhokhali commented 1 year ago

Hello, I have performed functional annotation and wanted to get all the possible GO terms from the given genome of Chironomus riparius. I have only used protein sequences of CDS of the genome and run the eggnogmapper.

Firstly, I have used protein sequences of CDS of a sample genome of D.melanogaster associated with cell cycle and run the eggnogmapper with the command "emapper.py -i flybase_DM_cyc_protein.faa -o test_cyc --override -m diamond --sensmode fast --go_evidence non-electronic". I have got the file as output "test_cyc.emapper.annotations". And I have counted all the GO IDs that I have got from the output file. Actually I wanted to count the number of GO ID (GO:0044848) i.e. biological phase and particularly GO:0022403 i.e. cell cycle phase. But I got 0 counts. I have also tried with --sensmode sensitive but it gave me same results.

And I have tried same with the genome of Chironomus riparius genome and it also gives 0 counts.

Why am I not getting the counts of those IDs of child nodes of biological process??

Cantalapiedra commented 1 year ago

Hi @snehakhokhali ,

Did you try with --go_evidence all?

snehakhokhali commented 1 year ago

are there all the GO terms included in database?

Cantalapiedra commented 1 year ago

No. Actually, I did a search and GO:0044848 is present in the database but GO:0022403 is not.

snehakhokhali commented 1 year ago

Could you please make an effort to include the GO term (GO:0022403) somehow in the database? I need the cell cycle phase for my project.

Cantalapiedra commented 1 year ago

Hi @snehakhokhali ,

If I could do it, I would be glad to help. But unfortunately eggNOG DB is complex and a lot of effort is done to release each new version.

Sorry for the inconveniences.

Did you try annotating your proteins using some species (or related-species) reference or a similar strategy?

Best, Carlos

snehakhokhali commented 1 year ago

Sorry, I didn't get your question.

Cantalapiedra commented 1 year ago

Sorry, I will try to say in other words.

If understand correctly, you are trying to annotate those GO terms in your proteins, being the latter from specific eukaryotic genomes. If that is the case, you may try mapping your proteins to a reference, already annotated, genome, from a related species. For instance, you may map your proteins against a Drosophila reference genome and retrieve the annotations, including GO terms, from those mappings... But this is just a suggestion that may or may not fit your project goals.

Best, Carlos

snehakhokhali commented 1 year ago

But I have mapped my protein against an annotated reference genome of Chironomous riparius and wanted to retrieve the annotations including GO terms. But I don't know about that. Can we use the related species of a Drosophila reference genome to map my proteins of Chironomus riparius?

Actually I have a fasta file and gff file of genome of Chironomus riparius. I wanted to use it as a reference genome to retrieve GO terms from those mappings and use those GO terms for RNA- seq analyses.

Cantalapiedra commented 1 year ago

Hi,

Sorry, I don't know much about the available resources for the genome you are working with.

If you have a reference genome, which is already annotated with GOs, you may retrieve those GOs from the reference after mapping your proteins to it.

If the reference genome doesn't have annotated GOs, then it is a different picture, and you may need to rely on other strategies, like looking into other similar genomes which are annotated with GOs, or trying to annotate your proteins with eggNOG (which we already know it is not working for your goals), or mapping your proteins to Uniprot, nr, or other means and DBs which I am not aware of.

snehakhokhali commented 1 year ago

I am performing the functional annotation. I have got the fasta file of genome of Chironomus riparius and gff file retrieved from MAKER (structural annotation). I am trying to annotate the protein with eggNOG so that I can retrieve GO terms. And I need the GO terms of biological process. But unfortunately there are no GO terms of cell cycle phase in eggNOG. Can you suggest some paper which doesn't have annotated GOs in reference genome?

Cantalapiedra commented 1 year ago

Is there any other closely related-species which has an already annotated reference genome? (I don't know if D. melanogaster or other species are close enough to Chironomus riparius or not). You may check Ensembl genomes or other sources of reference genomes, which usually include GO annotations.

Another way to try, maybe mapping your proteins to UniprotKB and try to annotate using the GO terms associated to Uniprot entries. Note that when you check the gene products of GO terms (http://amigo.geneontology.org/amigo/search/bioentity?q=GO%3A0022403&searchtype=geneproduct) the entries are proteins from DB like Uniprot, FlyBase, etc. For isntance, there are 9 proteins annotated as GO:0022403 for D. melanogaster in the previous link.

I don't know about current papers on these topics, but, yes, as you said and you sure already did, I would try looking for recent papers describing genome annotation of eukaryotic species similar to yours, to get ideas.

snehakhokhali commented 1 year ago

thank you for the suggestion.

Cantalapiedra commented 1 year ago

good luck!