adrn / schwimmbad

A common interface to processing pools.
MIT License
115 stars 18 forks source link

Bug w/ mpi4py >=3.0? #32

Closed adrn closed 2 months ago

adrn commented 4 years ago

Via astrocrash on twitter:

there appears to be a bug in schwimmbad 0.3.1+ that makes it incompatible with mpi4py 3.0+. Are you aware?

File "/site-packages/schwimmbad/mpi.py", line 122, in wait func, arg = task ValueError: not enough values to unpack (expected 2, got 1)

pkgw commented 2 months ago

Still seeing this in the conda-forge test suite as of today ...

adrn commented 2 months ago

I was never able to reproduce this error -- where are you seeing it? I just added some explicit mpi4py 3.x and 4.x tests to the CI workflow but these tests are all passing in #55.

pkgw commented 2 months ago

Hmm, maybe James's diagnosis of the origin of the problem wasn't quite correct. You can see the issue pop up in the conda-forge CI test runs though, e.g.:

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=1002170&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=55f066f8-9d52-54c8-1ae3-a2d9002ff304&l=4041

If you scroll up from there you can see the precise set of packages that are being installed for the test run in which the issue surfaces.

adrn commented 2 months ago

OK I think I understand what's going on here. It's not a bug with mpi4py or compatibility between schwimmbad and mpi4py. I think it's an issue with how MOSFIT is using schwimmbad's MPIPool. I didn't really intend for people to use the MPI comm directly, but MOSFIT is using it to send its own MPI messages and then calling pool.wait() to wait for those processes to return: https://github.com/guillochon/MOSFiT/blob/fd886ca4ece90bd98989afb355ee6ba60af0aeda/mosfit/fitter.py#L321

This should fail -- pool.wait() is meant to be used internally by the pool's .map() functionality, and the messages are expected to contain two elements. I think given this, this is an issue with MOSFIT to resolve, so I'm going to close this. But feel free to open a new issue if you spot something else!