GutenkunstLab / SloppyCell

BSD 3-Clause "New" or "Revised" License
4 stars 6 forks source link

Hang in parallel execution #2

Closed bcdaniels closed 2 years ago

bcdaniels commented 4 years ago

Hello!

We've been having issues recently with parallel execution in SloppyCell. Specifically, worker processes hang when receiving Network objects. It appears the hang happens in Network_mod.run_distutils as the worker process tries to compile the network's C code (the core.setup line). This only happens for the worker process—the master process can compile the C code fine.

I can try to construct a minimal example if that will help, but I thought I would ask first if you've run into this problem before. It appears to be happening on multiple platforms (maybe only with more recent versions of openmpi?), but currently I'm using macOS 10.14.6, python 2.7.17, openmpi 4.0.2, installed using anaconda.

Thanks!

PS On a related note, pypar appears to be no longer supported. Transitioning to python 3, mpi4py (https://mpi4py.readthedocs.io/en/stable/) appears to be a good alternative. If we can figure out this hanging issue, I already have a branch of SloppyCell that uses mpi4py that could potentially be merged in.

RyanGutenkunst commented 4 years ago

A minimal example would be helpful, but unfortunately I can't promise much insight. My group hasn't used SloppyCell heavily for a few years, so we haven't invested much in it.

I did attempt to port to Python 3 last year. The hangup is the code for taking symbolic derivatives of python code. The module SloppyCell uses for that is deprecated. There is a potential substitute in the Python 3 core, but it would require some dedicated time to port the code over, and I unfortunately don't have that now. If you or someone in your group is interested, I'm happy to help guide.

bcdaniels commented 4 years ago

Thanks for the quick response. I have an idea at least for a temporary workaround that we'll try to implement soon.

I agree that migrating this all to Python 3 needs to be done eventually, but will take a good chunk of work, so we'll leave it for later.

tjquinn1 commented 3 years ago

@RyanGutenkunst We are working on a python 3 port currently and would be interesting in hearing any insight you have into porting it.

RyanGutenkunst commented 3 years ago

@tjquinn1 As you'll see in the commit history, I made a bunch of the small changes earlier. As I mentioned, the big holdup is the symbolic derivative code in ExprManip. On the good hand, that's pretty self-contained, so if you can find anyway to replace it, then it should be easy to drop in.

Let me know how it goes. I'm happy to offer advice if you run into headaches.

tjquinn1 commented 3 years ago

Great, thanks. I will reach out with questions if I have any.