This is more of an announcement for others who might encounter the same issue, I found a solution already but I thought it should be posted somewhere and maybe added to the docs if others experience the same issue when using multiprocessing with emcee. I'm a bit of a novice with parallel processing so please forgive me if this is obvious.
Multiprocessing has worked fine in the past for most my needs in emcee, but recently I came across an issue where the sampler would stall out upon instantiation indefinitely when I used some complex external packages (pyccl). I noticed that the issue wasn't happening on my Mac but was happening on the linux cluster. After digging, I found the only way to get around this was changing context which the processes are created for the multiprocessing Pool. I noticed that my Mac was using a spawn context for creating processes where the linux version was defaulting to fork, the documentation uses the fork context as well but I found switching to spawn fixed this stalling issue when I upped the complexity of my model function code. I read online that fork is being phased out and replaced with spawn as the default context in future python as well.
If anybody experiences this indefinite stalling when running their sampler with multiprocessing (when cancelling the code after stall starts we get the following)
300 try: # restore state no matter what (e.g., KeyboardInterrupt)
301 if timeout is None:
--> 302 waiter.acquire()
303 gotit = True
304 else:
I'd recommend trying to change the Pool to use the spawn context manually
with multiprocessing.get_context("spawn").Pool() as pool:
sampler = emcee.EnsembleSampler(
nwalkers,
ndim,
log_probability,
args=(...),
pool = pool,
backend = backend
)
this fixed the issue for me after spending many hours trying everything else. I didn't feel like this required a pull request since I didn't need to modify any source code but I hope this is useful for someone else.
General information:
Problem description:
This is more of an announcement for others who might encounter the same issue, I found a solution already but I thought it should be posted somewhere and maybe added to the docs if others experience the same issue when using multiprocessing with emcee. I'm a bit of a novice with parallel processing so please forgive me if this is obvious.
Multiprocessing has worked fine in the past for most my needs in emcee, but recently I came across an issue where the sampler would stall out upon instantiation indefinitely when I used some complex external packages (pyccl). I noticed that the issue wasn't happening on my Mac but was happening on the linux cluster. After digging, I found the only way to get around this was changing context which the processes are created for the multiprocessing Pool. I noticed that my Mac was using a
spawn
context for creating processes where the linux version was defaulting tofork
, the documentation uses thefork
context as well but I found switching tospawn
fixed this stalling issue when I upped the complexity of my model function code. I read online thatfork
is being phased out and replaced withspawn
as the default context in future python as well.If anybody experiences this indefinite stalling when running their sampler with multiprocessing (when cancelling the code after stall starts we get the following)
I'd recommend trying to change the Pool to use the
spawn
context manuallythis fixed the issue for me after spending many hours trying everything else. I didn't feel like this required a pull request since I didn't need to modify any source code but I hope this is useful for someone else.
More info I found to help me get to this conclusion can be found here: https://pythonspeed.com/articles/python-multiprocessing/