eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
562 stars 105 forks source link

1-to-1 orthologues finding #320

Closed rodrisenovilla closed 3 years ago

rodrisenovilla commented 3 years ago

Hi!

First of all, thank you for this incredible resource! I am new to gene annotation and species comparisons, so I hope not to make any foolish question or assumption.

I am working with a new species genome (Paroedura picta, Madagascar ground gecko) and I would like to compare its gene expression with multiple species. Therefore, I am using EggNOG for stablishing the homology of each predicted gecko cds/pep against my family of interest (--tax scope chordates 7711) and to infer the comparable 1-to-1 orthologues (--target orthologues filter). Moreover, I have anaylsed the predicted cds/pep of the comparative species with EggNOG emapper to obtain a common 1-to-1 orthologues dataset among the Ensembl annotated species and the gecko.

What I was expecting from the EggNOG readout was to obtain all the isoforms of only 1-to-1 orthologues in chordates family for each species analysed, so I could just filter my expression dataset with these common genes. Nonetheless, after reading #243 and confirming the existance of 1-to-many orthologues among eggnog outputs of my interest species, I've come to the conclusion that probably I may need to filter afterwards with the emapper.orthologues file or in the case of my not ensembl annotated species (I don´t even obtain the orthologues file in the output), filtering by tediously exploring the orthologous relations of the seed orthologues.

All in all, I would like to know if I am missing an easier approach (using genome?) or option (--target_taxa?); or simply, if my workflow suits to the possible applications of EggNOG or not. Thank you in advance and sorry for the inconvenience!

Best, Rodrigo Senovilla Ganzo

Cantalapiedra commented 3 years ago

Hi Rodrigo,

Thank you very much for your kind words.

Just in case a brief answer is of help (I am out of office), did you try the '--target_orthologs one2one' option?

Best, Carlos

El lun., 28 jun. 2021 11:44, rodrisenovilla @.***> escribió:

Hi!

First of all, thank you for this incredible resource! I am new to gene annotation and species comparisons, so I hope not to make any foolish question or assumption.

I am working with a new species genome (Paroedura picta, Madagascar ground gecko) and I would like to compare its gene expression with multiple species. Therefore, I am using EggNOG for stablishing the homology of each predicted gecko cds/pep against my family of interest (--tax scope chordates 7711) and to infer the comparable 1-to-1 orthologues (--target orthologues filter). Moreover, I have anaylsed the predicted cds/pep of the comparative species with EggNOG emapper to obtain a common 1-to-1 orthologues dataset among the Ensembl annotated species and the gecko.

What I was expecting from the EggNOG readout was to obtain all the isoforms of only 1-to-1 orthologues in chordates family for each species analysed, so I could just filter my expression dataset with these common genes. Nonetheless, after reading #243 https://github.com/eggnogdb/eggnog-mapper/issues/243 and confirming the existance of 1-to-many orthologues among eggnog outputs of my interest species, I've come to the conclusion that probably I may need to filter afterwards with the emapper.orthologues file or in the case of my not ensembl annotated species (I don´t even obtain the orthologues file in the output), filtering by tediously exploring the orthologous relations of the seed orthologues.

All in all, I would like to know if I am missing an easier approach (using genome?) or option (--target_taxa?); or simply, if my workflow suits to the possible applications of EggNOG or not. Thank you in advance and sorry for the inconvenience!

Best, Rodrigo Senovilla Ganzo

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/320, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIMQ3TFMWHUWDV5G43HYS3TVBABBANCNFSM47NTHFBQ .

rodrisenovilla commented 3 years ago

Yes, I selected --target_orthologs one2one in the website emapper, sorry if I didn´t explain myself properly. Thank you for the quick response, looking forward for more information! Best, Rodrigo

Cantalapiedra commented 3 years ago

Hi Rodrigo,

my fault, I was reading with the phone and was not very accurate in my previous answer.

Hi!

First of all, thank you for this incredible resource! I am new to gene annotation and species comparisons, so I hope not to make any foolish question or assumption.

I am working with a new species genome (Paroedura picta, Madagascar ground gecko) and I would like to compare its gene expression with multiple species. Therefore, I am using EggNOG for stablishing the homology of each predicted gecko cds/pep against my family of interest (--tax scope chordates 7711) and to infer the comparable 1-to-1 orthologues (--target orthologues filter). Moreover, I have anaylsed the predicted cds/pep of the comparative species with EggNOG emapper to obtain a common 1-to-1 orthologues dataset among the Ensembl annotated species and the gecko.

There is a recent issue about this approach too. In my opinion, the orthologs obtained with eggnog-mapper are a useful resource but definitely not a proof of orthology between different input files. There are other approaches suited for this. get_homologues-EST comes to my mind (http://eead-csic-compbio.github.io/get_homologues/manual-est/), in which you can create groups of orthologs. However, the parameters are tuned for strains/subspecies/cultivars of the same species (to create a pangenome). Not sure how this behaves for different species. You could also use OrthoMCL or other software to identify orthologs (https://github.com/davidemms/OrthoFinder, https://inparanoid.sbc.su.se/cgi-bin/index.cgi, https://academic.oup.com/bioinformatics/article/35/1/149/5056041) yourself, or a similar approach to try to identify the orthologs groups between your species of interest and compare expression afterwards.

This being said, you could also compare expression by function instead of by gene, in which case you could still use the eggnog-mapper output for that.

What I was expecting from the EggNOG readout was to obtain all the isoforms of only 1-to-1 orthologues in chordates family for each species analysed, so I could just filter my expression dataset with these common genes. Nonetheless, after reading #243 and confirming the existance of 1-to-many orthologues among eggnog outputs of my interest species, I've come to the conclusion that probably I may need to filter afterwards with the emapper.orthologues file or in the case of my not ensembl annotated species (I don´t even obtain the orthologues file in the output), filtering by tediously exploring the orthologous relations of the seed orthologues.

Note that #243 issue refers to many-to-one orthology, under the assumption that the one-2-one relationship would be between the query and the targets. This is not the case of eggnog-mapper though. The orthology relationship (one-2-one, etc) here is regarding the seed ortholog and the other orthologs identified and used to transfer function to the query. To try to illustrate this: "query --> homology --> seed ortholog --> orthology --> orthologs --> function --> query"

All in all, I would like to know if I am missing an easier approach (using genome?) or option (--target_taxa?); or simply, if my workflow suits to the possible applications of EggNOG or not. Thank you in advance and sorry for the inconvenience!

--target_taxa would be useful only when you want to transfer function from orthologs from a given taxon. For example, you want to obtain the orthologs (of the seed ortholog) of your gecko queries only from a specific species within the already applied --tax_scope.

Best, Rodrigo Senovilla Ganzo

I hope any of this makes sense.

Best, Carlos

rodrisenovilla commented 3 years ago

Firstly, thank you for your detailful answer!

I have seen one of your entries proposing OrthoMCL, but I reckoned it didn´t fit our requiremens as our "new" species was not present in their database. Anyway, it was just a quick look, I will read it more in-deepth and I will go through the tools you named.

Regarding the orthologues transfer, I believe I now get the whole idea of EggNOG functioning and out of curiosity, I will try to introduce the --target_taxa to compare the annotation I get using different taxa. Nonetheless, I will probably stick to compare expression by function as you suggested.

Thank you again! Best, Rodrigo

Cantalapiedra commented 3 years ago

Good luck @rodrisenovilla .

I kind of remember that orthomcl could be run for your own proteomes too: https://orthomcl.org/orthomcl/app/downloads/software/v2.0/UserGuide.txt

Best, Carlos

rodrisenovilla commented 3 years ago

Yes! I finally got my list of orthologues from there and I additionally tried OrthoFinder to compare both. Thank you very much! Best, Rodrigo

Cantalapiedra commented 3 years ago

Glad to hear that! Best, Carlos