eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
561 stars 105 forks source link

number of queries scanned very different than number of proteins in input fasta #477

Open ecpierce opened 1 year ago

ecpierce commented 1 year ago

Hi!,

I have an issue that is somewhat similar to this one

My input protein fasta has around 11,000 protein sequences.

The final output says : Total hits processed: 8949

And my annotation file says 8949 queries scanned Around 8000 have actual annotations in the annotation file.

I understand that 949ish may not have seed orthologs so don't get annotations, but what could be happening to the missing 2000 proteins?

I am running emapper.py --cpu 8 -i outputs/processedprots/Asp.fasta --output adentestegg --output_dir outputs/ -m diamond --tax_scope none --seed_ortholog_score 60 --override --temp_dir tmp/ --data_dir outputs/databases/eggnog_db/

emapper-2.1.10-e2c6d39

Cantalapiedra commented 1 year ago

Hi @ecpierce ,

I would need to chech the contents of ".emapper.hits", ".emapper.seed_orthologs", and ".emapper.annotations", but it is possible that 8949 of your proteins had hits against eggNOG 5.0, and ~8,000 had valid annotations.

Best, Carlos