Closed dstndstn closed 2 months ago
Hey! I'm just getting back to work from parental leave, but I'll take a look at this within the next few weeks. Thanks for this!
Whoops, where did those months go? Thanks for the patience -- I haven't seen the issue you described, but I also don't know why this was using ssend
to begin with (it probably traces back to ye olde MPIPool
implementation in emcee, where some of this all started...). So I'm find with changing it to the more standard send
! Thanks for catching.
Hi,
I'm using Ubuntu 20.04, with the OS openmpi package, mpi4py 3.1.5, and schwimmbad 0.3.2. This is on the "symmetry" cluster at Perimeter Institute.
The behavior I'm seeing is that when creating an MPIPool(), I see each worker getting one task, it finishes the task and sends the result back, and the boss receives the result, but the workers never proceed to the next task.
Via some sophisticated printf debugging, I found that the workers were never returning from the
self.comm.ssend()
call. My wise colleague suggested changing that toself.comm.send()
, and then it works perfectly!I don't think you need any of the synchronization implied by
ssend
, so this should be fine?My system details:
$ mpiexec --version mpiexec (OpenRTE) 4.0.3 $ ls -l $(which mpiexec) lrwxrwxrwx 1 root root 25 Aug 15 2023 /usr/bin/mpiexec -> /etc/alternatives/mpiexec $ ls -l /etc/alternatives/mpiexec lrwxrwxrwx 1 root root 24 Aug 15 2023 /etc/alternatives/mpiexec -> /usr/bin/mpiexec.openmpi