dfm / emcee

The Python ensemble sampling toolkit for affine-invariant MCMC
https://emcee.readthedocs.io
MIT License
1.46k stars 431 forks source link

problem with mulltiprocessing #148

Closed neuronphysics closed 7 years ago

neuronphysics commented 9 years ago

I have tried to use emcee in a class and also make multiprocessing module to work but after running such a test code:

import numpy as np
import emcee
import scipy.optimize as op
# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 0.534

# Generate some synthetic data from the model.
N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)

class modelfit():
      def  __init__(self):
          self.x=x
          self.y=y
          self.yerr=yerr
          self.m=-0.6
          self.b=2.0
          self.f=0.9
      def get_results(self):
          def func(a):
              model=a[0]*self.x+a[1]
              inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*a[2]))
              return 0.5*(np.sum((self.y-model)**2*inv_sigma2 + np.log(inv_sigma2)))
          result = op.minimize(func, [self.m, self.b, np.log(self.f)],options={'gtol': 1e-6, 'disp': True})
          m_ml, b_ml, lnf_ml = result["x"]
          return result["x"]
      def lnprior(self,theta):
          m, b, lnf = theta
          if -5.0 < m < 0.5 and 0.0 < b < 10.0 and -10.0 < lnf < 1.0:
             return 0.0
          return -np.inf
      def lnprob(self,theta):
          lp = self.lnprior(theta)
          likelihood=self.lnlike(theta)
          if not np.isfinite(lp):
             return -np.inf
          return lp + likelihood
      def lnlike(self,theta):
          m, b, lnf = theta
          model = m * self.x + b
          inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*lnf))
          return -0.5*(np.sum((self.y-model)**2*inv_sigma2 - np.log(inv_sigma2)))
      def run_mcmc(self,nstep):
          ndim, nwalkers = 3, 100
          pos = [self.get_results() + 1e-4*np.random.randn(ndim) for i in range(nwalkers)]
          self.sampler = emcee.EnsembleSampler(nwalkers, ndim, self.lnprob,threads=10)
          self.sampler.run_mcmc(pos, nstep)
test=modelfit()
test.x=x
test.y=y
test.yerr=yerr
test.get_results()
test.run_mcmc(5000)

I got this error message: Traceback (most recent call last): File "", line 1, in File "", line 27, in run_mcmc File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get raise self._value NotImplementedError: pool objects cannot be passed between processes or pickled

or similar error raise self._value cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup builtin.instancemethod failed

I am wondering how I could solve this problem? One way that has been suggested in stackoverflow is to use pathos.multiprocesssing as it is mentioned here: http://stackoverflow.com/a/21345273/2379433

But I think it should be done in the emcee source code. Thanks in advance.

timstaley commented 9 years ago

Hi @neuronphysics , I've recently been naively building some code on emcee and ran into the exact same problem. The issue is that you have assigned the emcee sampler as an attribute of your class, which also holds the probability functions. When running emcee in multithreaded mode, it will attempt to pickle the probability function for transfer to the other thread, which means also pickling the class it belongs to. But then it attempts to pickle the emcee sampler, which holds the multiprocessing pool - big problem!

The approach I took to solving this was to simply split out the emcee sampler from my class - you can still have class-method functions that wrap the sampler creation / calling routines, but you have to store the actual sampler in a separate variable.

You might then find you also run into problems pickling bound methods, but that's easier to solve, either by importing pathos or by including your own pickle functions, cf http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods#edit2155350 (referenced from e.g. http://stackoverflow.com/questions/22450160/using-multiprocessing-pool-on-bound-methods).

neuronphysics commented 9 years ago

Hi @timstaley Thanks for the reply. As far as I understood, I attributed my sampler to a normal variable and then I copied the aforementioned functions _pickle_method and _unpickle_method to my code and then I ran my code again and got this error: File "", line 6, in _unpickle_method AttributeError: ("class modelfit has no attribute 'mro'", <function _unpickle_method at 0x7f693f1911b8>, ('log_posterior', <modelfit instance at 0x7f693e2f5638>, <class modelfit at 0x7f693f192050>))

Did I understand the solution you have suggested correctly? Another question, how could I use pathos when the original call for the multiprocessing is wrapped in the emcee codes?

timstaley commented 9 years ago

That sounds like an error in the pickling setup somewhere. I would suggest trying something like

import pickle s = pickle.dumps(modelfit)

As a basic test - if you can get that working you should be most of the way there. Alternatively, I think when you import pathos.multiprocessing it does a little magic and overrides the built-in multiprocessing module, but I managed to get up and running using the alternative hack above so haven't explored pathos much.

mmckerns commented 9 years ago

I'm the dill and pathos author. The pathos hack is primarily that it imports dill instead of cPickle. There's some other stuff too, but that's the big one line change. With dill you can serialize most things in python, so parallel computing (especially for statistics and monte carlo etc) becomes much easier to do. There are some packages (including ipython and mpi4py) starting to provide a "drop-in" mechanism that allows for replacement of the serializer or map functions used in their code. I'd be interested in seeing that this works with emcee.

neuronphysics commented 9 years ago

Well, I am trying different things to make my code work and should confess not very familiar how to unpickle the method. I just used the following method : import numpy as np import emcee import scipy.optimize as op

m_true = -0.9594
b_true = 4.294
f_true = 0.534

N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)
def unwrap_self_modelfit(arg, **kwarg):
    return modelfit.lnprob(*arg, **kwarg)

class modelfit():
         def  __init__(self):
               self.x=x
               self.y=y
               self.yerr=yerr
               self.m=-0.6
               self.b=2.0
               self.f=0.9
         def get_results(self):
               def func(a):
                     model=a[0]*self.x+a[1]
                     inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*a[2]))
                     return 0.5*(np.sum((self.y-model)**2*inv_sigma2 + np.log(inv_sigma2)))
               result = op.minimize(func, [self.m, self.b, np.log(self.f)], options={'gtol': 1e-6, 'disp': True})
               m_ml, b_ml, lnf_ml = result["x"]
               return result["x"]
         def lnprior(self,theta):
               m, b, lnf = theta
               if -5.0 < m < 0.5 and 0.0 < b < 10.0 and -10.0 < lnf < 1.0:
                     return 0.0
               return -np.inf
         def lnprob(self,theta):
               lp = self.lnprior(theta)
               likelihood=self.lnlike(theta)
               if not np.isfinite(lp):
                  return -np.inf
               return lp + likelihood
         def lnlike(self,theta):
               m, b, lnf = theta
               model = m * self.x + b
               inv_sigma2 = 1.0/(self.yerr**2 + model**2*np.exp(2*lnf))
               return -0.5*(np.sum((self.y-model)**2*inv_sigma2 - np.log(inv_sigma2)))
        def run_mcmc(self,nstep):
               ndim, nwalkers = 3, 100
               pos = [self.get_results() + 1e-4*np.random.randn(ndim) for i in range(nwalkers)]
               sampler = emcee.EnsembleSampler(nwalkers, ndim, unwrap_self_modelfit, threads=10)
               return sampler.run_mcmc(pos, nstep)

test=modelfit()

test.x=x
test.y=y
test.yerr=yerr
test.get_results()
test.run_mcmc(5000)

And got this error message:

  args: []
  kwargs:TypeError: unbound method lnprob() must be called with modelfit instance as first argument (got float64 instance instead)
 {}
  exception:
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 505, in __call__
    return self.f(x, *self.args, **self.kwargs)
  File "<stdin>", line 2, in unwrap_self_modelfit
TypeError: unbound method lnprob() must be called with modelfit instance as first argument (got float64 instance instead)
  File "<stdin>", line 37, in run_mcmc
  File "build/bdist.linux-x86_64/egg/emcee/sampler.py", line 157, in run_mcmc
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 198, in sample
  File "build/bdist.linux-x86_64/egg/emcee/ensemble.py", line 382, in _get_lnprob
  File "build/bdist.linux-x86_64/egg/emcee/interruptible_pool.py", line 94, in map
  File "/vol/aibn84/data2/zahra/anaconda/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
TypeError: unbound method lnprob() must be called with modelfit instance as first argument (got float64 instance instead)
mmckerns commented 9 years ago

This traceback says that you are passing a float to lnprob instead of a modelfit instance. That sounds more like a coding error than a pickling error.

ceyzeriat commented 9 years ago

Hi guys, I've run into that same problem lately. My solution was to run N tymes emcee in parallel and re-assemble the lnprob and chain back together. Say I have 1000 walkers for a MCMC simulation, 10 parameters, 500 iterations. Then I build bits of 20 walkers (the minimum given 10 parameters) that I sling to 50 mono-core emcee simulations (all 500 iterations each).. and go get a coffee :-p until I can re-assemble the 50 "lnprob" and "chain" arrays using np.concatenate. It seems to work perfectly good. Bit of a hack though. dfm, could you see anything wrong with that approach? i.e. is emcee treating each walker separately, or doing some complex global optimizing over groups of walkers? I did run into another pickling error later, but unrelated to emcee. I installed dill and pathos and it just worked out of the box. Quite incredible! Cheers

dstndstn commented 9 years ago

Uhh, wasn't the "same problem" above resolved as user error? Could you please post a minimal example showing the problem you are seeing.

Merging together the chains is more than a "hack". That's more like multiple restarts or something. It will definitely give you different results in detail. And you will have to be smart about burn-in (as always).

You may want to read the paper or page through some of DFM's talks on emcee (all his slides are online); the walkers do interact (that's what allows it to be affine-invariant and parallelizable in the first place).