eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
570 stars 106 forks source link

--cpu doesn't prevent phmmer's default behavior from saturating multicore machines #76

Closed alicedb2 closed 3 years ago

alicedb2 commented 6 years ago

During the refinement stage, the --cpu X argument is used to start X phmmer processes, but phmmer's own --cpu argument is left unspecified. By default phmmer will use all available cores and X such instances will together saturate all cpus!

I think a quick fix would be to use phmmer --cpu 1, but I'm unsure if it'd be consistent with all use cases.

jhcepas commented 6 years ago

parallelization is per phmmer process. I remember doing some tests and seeing that 10 parallel phmmer processes were faster than ten times phmmer --cpu 10.

alicedb2 commented 6 years ago

What I'm trying to say is that emapper starts phmmer without --cpu argument and by default, if left unspecified, phmmer will try to use all available cores. Therefore when you call emapper --cpu 10, emapper creates 10 phmmer processes (expected) but each one of these processes tries to use as many cores as possible (unexpected).

On our 64 cores server almost all 64 cores end up being utilized even though eggnog was called with emapper --cpu 10. Each of the 10 phmmer process ends up using between 1 to 6 cores. Starting 10 phmmer processes with phmmer --cpu 1 would fix this behavior.

jhcepas commented 6 years ago

Sorry, I misunderstood the problem. In that case you are right, we should force --cpu 1 in phmmer. Thanks for reporting it.

Cantalapiedra commented 3 years ago

If I am not wrong, this should be fixed in current versions.