eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
563 stars 105 forks source link

Only a small percentage of allocated CPU is used #379

Open EmilieBruun opened 2 years ago

EmilieBruun commented 2 years ago

Hi,

I have outputs from prodigal (protein) that I would like annotate with eggNOG. Each input file has about 1.5 M proteins. I have tried running 5 jobs on 1 node, requesting 38gb mem and 8 cpu per job. However, this takes a long time to finish (~7 days) and only about 5% of the allocated CPU was used for each job. Do you have any idea why this is the case?

Thanks :) Best, Emilie

Cantalapiedra commented 2 years ago

Hi @EmilieBruun ,

could you please share the specific emapper version and command used?

Thank you.

Best, Carlos

EmilieBruun commented 2 years ago

The command I used was:

emapper.py --mp_start_method forkserver -o $out_file --output_dir $outDir --override -m diamond --dmnd_ignore_warnings --dmnd_algo ctg -i $input_file --evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20 --itype proteins --tax_scope auto --target_orthologs all --go_evidence non-electronic --pfam_realign none --cpu 8

The emapper version is 2.1.6:

emapper-2.1.6 / Expected eggNOG DB version: 5.0.2 / Installed eggNOG DB version: 5.0.2 / Local diamond version: diamond version 2.0.11

Cantalapiedra commented 2 years ago

Hi,

If you could reserve a bit more memory (maybe 44-48GB, https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.7#other-requirements) you could use the --dbmem option, which makes the annotation step much faster, specially for medium to large input data sets. Check https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.7#annotation-options

Besides that, you could check specific diamond options which are wrapped by eggnog-mapper, to try to make the search step faster. Check https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.7#diamond-search-options.

However, I guess that the --dbmem option is going to have more impact in running time.

I hope this is of help.

Best, Carlos

EmilieBruun commented 2 years ago

Thank you so much, this sped up the process at lot and the jobs finished around 10 hours instead of 7 days.

Cantalapiedra commented 2 years ago

Glad to hear that! Thank you!