Hi, I'm running into an issue with george and emcee. My likelihood function involves a call to gp.predict. This is fast in serial, but when I try to run emcee with a multiprocessing Pool, each gp.predict call takes a longer time. (Not sure whether I should have opened an issue here or at emcee!)
Below is a minimum failing example. When I run this on my 24-core machine, this is the output:
Serial took 5.1 seconds
Parallel multiprocessing took 39.5 seconds
The parallel one is much slower in total. Individual gp.predict calls are ~10x slower in parallel (serial: ~0.002s, parallel: ~0.02s).
I tried adding:
import os
os.environ["OMP_NUM_THREADS"] = "1"
Then I get this output:
Serial took 5.1 seconds
Parallel multiprocessing took 1.6 seconds
This makes the parallel job faster than the serial job, by about 3x. However, the individual predict calls still take 2-3x longer (serial: ~0.002s, parallel: ~0.004-0.006s), so I'm not getting the speedup I expect.
This holds up for varying N, nwalkers, and ndim. I am having this issue in my full-scale code with a large training set and high-dimensional statistics. There the predict calls take up to 6x longer in parallel (with OMP_NUM_THREADS="1").
Could this have to do with the same gp object not being able to be called by multiple threads at the same time? Is there a way around this / am I missing something with the parallelization? Thank you!
Minimum failing example:
import time
import numpy as np
import emcee
import george
from george import kernels
def log_prob(theta):
xval = np.random.random()*10 # to be in same xrange
s = time.time()
pred, pred_var = gp.predict(y, xval, return_var=True)
e = time.time()
print("GP predict time:", e-s) #This will print many lines, may want to comment out
return pred
## Set up MCMC parameters
nwalkers = 24
ndim = 2
np.random.seed(42)
initial = np.random.randn(nwalkers, ndim)
nsteps = 100
## Build GP (from george tutorial)
N = 1500
np.random.seed(1234)
x = 10 * np.sort(np.random.rand(N))
yerr = 0.2 * np.ones_like(x)
y = np.sin(x) + yerr * np.random.randn(len(x))
kernel = np.var(y) * kernels.ExpSquaredKernel(0.5)
gp = george.GP(kernel)
gp.compute(x, yerr)
## Serial
from multiprocessing import cpu_count
ncpu = cpu_count()
print("{0} CPUs".format(ncpu))
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob)
start = time.time()
sampler.run_mcmc(initial, nsteps)
end = time.time()
serial_time = end - start
print("Serial took {0:.1f} seconds".format(serial_time))
## Parallel
from multiprocessing import Pool
with Pool() as pool:
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
start = time.time()
sampler.run_mcmc(initial, nsteps)
end = time.time()
multi_time = end - start
print("Parallel multiprocessing took {0:.1f} seconds".format(multi_time))
Hi, I'm running into an issue with george and emcee. My likelihood function involves a call to gp.predict. This is fast in serial, but when I try to run emcee with a multiprocessing Pool, each gp.predict call takes a longer time. (Not sure whether I should have opened an issue here or at emcee!)
Below is a minimum failing example. When I run this on my 24-core machine, this is the output:
The parallel one is much slower in total. Individual gp.predict calls are ~10x slower in parallel (serial: ~0.002s, parallel: ~0.02s).
I tried adding:
Then I get this output:
This makes the parallel job faster than the serial job, by about 3x. However, the individual predict calls still take 2-3x longer (serial: ~0.002s, parallel: ~0.004-0.006s), so I'm not getting the speedup I expect.
This holds up for varying N, nwalkers, and ndim. I am having this issue in my full-scale code with a large training set and high-dimensional statistics. There the predict calls take up to 6x longer in parallel (with OMP_NUM_THREADS="1").
Could this have to do with the same gp object not being able to be called by multiple threads at the same time? Is there a way around this / am I missing something with the parallelization? Thank you!
Minimum failing example: