amusecode / amuse

Astrophysical Multipurpose Software Environment. This is the main repository for AMUSE
http://www.amusecode.org
Apache License 2.0
154 stars 98 forks source link

Do parallel runs always need N+1 tasks? #945

Closed Sbte closed 1 year ago

Sbte commented 1 year ago

If I want to run a code (in my case POP from OMUSE) on N cores (number_of_workers), say 128, it seems like I need to request N+1, so 129, tasks (in slurm), because AMUSE will call MPI.Spawn 128 times, but the original process already used 1 task, so that makes the total 129.

What's the proper way to solve this? Is it really not possible to reuse the original process for a worker?

ipelupessy commented 1 year ago

it depends, the N+1 option is safe if you can do it..oversubscribe is an option, but as you say this could be bad for performance: you should be careful that the MPI is not using any CPU when idling (depends on the MPI for example openmpi has --mca mpi_yield_when_idle 1) then it should be ok.