Closed neuronphysics closed 7 years ago
Hi @neuronphysics , I've recently been naively building some code on emcee and ran into the exact same problem. The issue is that you have assigned the emcee sampler as an attribute of your class, which also holds the probability functions. When running emcee in multithreaded mode, it will attempt to pickle the probability function for transfer to the other thread, which means also pickling the class it belongs to. But then it attempts to pickle the emcee sampler, which holds the multiprocessing pool - big problem!
The approach I took to solving this was to simply split out the emcee sampler from my class - you can still have class-method functions that wrap the sampler creation / calling routines, but you have to store the actual sampler in a separate variable.
You might then find you also run into problems pickling bound methods, but that's easier to solve, either by importing pathos or by including your own pickle functions, cf http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods#edit2155350 (referenced from e.g. http://stackoverflow.com/questions/22450160/using-multiprocessing-pool-on-bound-methods).
Hi @timstaley Thanks for the reply. As far as I understood, I attributed my sampler to a normal variable and then I copied the aforementioned functions _pickle_method and _unpickle_method to my code and then I ran my code again and got this error:
File "
Did I understand the solution you have suggested correctly? Another question, how could I use pathos when the original call for the multiprocessing is wrapped in the emcee codes?
That sounds like an error in the pickling setup somewhere. I would suggest trying something like
import pickle s = pickle.dumps(modelfit)
As a basic test - if you can get that working you should be most of the way there. Alternatively, I think when you import pathos.multiprocessing it does a little magic and overrides the built-in multiprocessing module, but I managed to get up and running using the alternative hack above so haven't explored pathos much.
I'm the dill
and pathos
author. The pathos
hack is primarily that it imports dill
instead of cPickle
. There's some other stuff too, but that's the big one line change. With dill
you can serialize most things in python, so parallel computing (especially for statistics and monte carlo etc) becomes much easier to do. There are some packages (including ipython
and mpi4py
) starting to provide a "drop-in" mechanism that allows for replacement of the serializer or map functions used in their code. I'd be interested in seeing that this works with emcee
.
Well, I am trying different things to make my code work and should confess not very familiar how to unpickle the method. I just used the following method : import numpy as np import emcee import scipy.optimize as op
m_true = -0.9594
b_true = 4.294
f_true = 0.534
N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)
def unwrap_self_modelfit(arg, **kwarg):
return modelfit.lnprob(*arg, **kwarg)
class modelfit():
def __init__(self):
self.x=x
self.y=y
self.yerr=yerr
self.m=-0.6
self.b=2.0
self.f=0.9
def get_results(self):
def func(a):
model=a[0]*self.x+a[1]
inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*a[2]))
return 0.5*(np.sum((self.y-model)**2*inv_sigma2 + np.log(inv_sigma2)))
result = op.minimize(func, [self.m, self.b, np.log(self.f)], options={'gtol': 1e-6, 'disp': True})
m_ml, b_ml, lnf_ml = result["x"]
return result["x"]
def lnprior(self,theta):
m, b, lnf = theta
if -5.0 < m < 0.5 and 0.0 < b < 10.0 and -10.0 < lnf < 1.0:
return 0.0
return -np.inf
def lnprob(self,theta):
lp = self.lnprior(theta)
likelihood=self.lnlike(theta)
if not np.isfinite(lp):
return -np.inf
return lp + likelihood
def lnlike(self,theta):
m, b, lnf = theta
model = m * self.x + b
inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*lnf))
return -0.5*(np.sum((self.y-model)**2*inv_sigma2 - np.log(inv_sigma2)))
def run_mcmc(self,nstep):
ndim, nwalkers = 3, 100
pos = [self.get_results() + 1e-4*np.random.randn(ndim) for i in range(nwalkers)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, unwrap_self_modelfit, threads=10)
return sampler.run_mcmc(pos, nstep)
test=modelfit()
test.x=x
test.y=y
test.yerr=yerr
test.get_results()
test.run_mcmc(5000)
And got this error message:
args: []
kwargs:TypeError: unbound method lnprob() must be called with modelfit instance as first argument (got float64 instance instead)
{}
exception:
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 505, in __call__
return self.f(x, *self.args, **self.kwargs)
File "<stdin>", line 2, in unwrap_self_modelfit
TypeError: unbound method lnprob() must be called with modelfit instance as first argument (got float64 instance instead)
File "<stdin>", line 37, in run_mcmc
File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob
File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map
File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
TypeError: unbound method lnprob() must be called with modelfit instance as first argument (got float64 instance instead)
This traceback says that you are passing a float to lnprob
instead of a modelfit
instance. That sounds more like a coding error than a pickling error.
Hi guys, I've run into that same problem lately. My solution was to run N tymes emcee in parallel and re-assemble the lnprob and chain back together. Say I have 1000 walkers for a MCMC simulation, 10 parameters, 500 iterations. Then I build bits of 20 walkers (the minimum given 10 parameters) that I sling to 50 mono-core emcee simulations (all 500 iterations each).. and go get a coffee :-p until I can re-assemble the 50 "lnprob" and "chain" arrays using np.concatenate. It seems to work perfectly good. Bit of a hack though. dfm, could you see anything wrong with that approach? i.e. is emcee treating each walker separately, or doing some complex global optimizing over groups of walkers? I did run into another pickling error later, but unrelated to emcee. I installed dill and pathos and it just worked out of the box. Quite incredible! Cheers
Uhh, wasn't the "same problem" above resolved as user error? Could you please post a minimal example showing the problem you are seeing.
Merging together the chains is more than a "hack". That's more like multiple restarts or something. It will definitely give you different results in detail. And you will have to be smart about burn-in (as always).
You may want to read the paper or page through some of DFM's talks on emcee (all his slides are online); the walkers do interact (that's what allows it to be affine-invariant and parallelizable in the first place).
I have tried to use emcee in a class and also make multiprocessing module to work but after running such a test code:
I got this error message: Traceback (most recent call last): File "", line 1, in
File "", line 27, in run_mcmc
File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample
File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob
File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map
File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
NotImplementedError: pool objects cannot be passed between processes or pickled
or similar error raise self._value cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup builtin.instancemethod failed
I am wondering how I could solve this problem? One way that has been suggested in stackoverflow is to use pathos.multiprocesssing as it is mentioned here: http://stackoverflow.com/a/21345273/2379433
But I think it should be done in the emcee source code. Thanks in advance.