Closed jade-davies closed 2 years ago
Hi @jade-davies ,
The main difference you will note in your GFF files is that the one for the genome is showing the positions of the features relative to the contigs, whereas for the proteins the GFF will show the positions relative to those proteins. Also, if you upload the genome, the gene prediction is done from scratch, and therefore the features could be different from the proteins you already have from Prokka.
Regarding the different "em_" fields, these are just the same fields you should find in the ".emapper.seed_orthologs" and ".emapper.annotations" files. Please check https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.7#Output_format. Ask if you need further info on any field.
Regarding what input is better, in general I would say that if you already did the Prokka analysis you could continue using just those proteins.
I hope this is of help.
Best, Carlos
Hi @Cantalapiedra,
Thanks so much for your reply, it's really helpful! If I understand correctly, the STRING database is used for the protein IDs, so I can get gene ontology information directly from there?
Thanks again, Jade
Hi @jade-davies ,
To be honest, I never worked with the STRING database, so I am afraid cannot help you on this. Maybe some of the IDs could be shared, but I am not sure to what extent. Hopefully someone with more experience with it could be of more help.
Best, Carlos
I will close this issue (since the original discussion seems finished).
Please, re-open or re-issue if needed.
Best, Carlos
QI0013_eggnog_prokkafaa.gff.xlsx QI0013_eggnog_genome.gff.xlsx
Hello,
I've annotated a bacterial genome using the web interface, both directly from the genome sequence (in FASTA file format), and also from a protein annotation file (generated by prokka, in FAA file format).
I have downloaded the out.emapper.decorated.gff files for each one, and I'm having a little trouble with interpreting the files. Please find the files attached - please could you explain what the different descriptions mean in the annotation, such as em_target, em_desc, em_max_annot_lvl, em_PFAMs? All this information seems to be placed within one tab.
Additionally, the annotation files seem quite different for the same genome, the only difference is the input file. Which method is better to use for input to the eggnog mapper?
Thank you in advance,
Jade