Closed benmontet closed 7 years ago
This is a problem because there is a random step in the HODLR solver's compute step. In the most recent version, your example can be coded to get deterministic results as follows:
import numpy as np
import george
from george.kernels import ExpSquaredKernel
a = 0.1
s = 0.5
x = np.linspace(0, 1000, 10000)
y = np.random.random(10000)*0.1 + 0.5
yerr = 0.1*np.ones(10000)
m = 0.5*np.ones(10000)
kernel = a * ExpSquaredKernel(s)
gp = george.GP(kernel, solver=george.HODLRSolver)
for i in xrange(10):
gp.compute(x, yerr, seed=1234)
print(gp.lnlikelihood(y-m))
I've also found that on more reasonable problems (where the GP model is actually a decent fit), this is less of a problem.
This problem came up again in an astrophysically relevant situation (modeling a transit fit + rotation period model).
I am able to minimize its effects with judicious usage of nleaf. Specifically nleaf=200 gives sufficient precision in my case without slowing down things too much. nleaf=2000 gives nearly machine precision, but is way too slow to be useful.
Can you give a specific code + data snippet and describe how the failure propagates to bad results? Like we discussed before this tends to matter more when the kernel is really wrong (poorly conditioned)… also using the "seed" parameter can often help to get reasonable results in practice.
Sure. My data comes from two telescopes, with different levels of photon noise, but the signal beyond the white noise should be dominated by stellar activity. I have the following code:
def GPlike(theta, x, y1, y2):
theta = np.exp(theta)
k = theta[0] * ExpSquaredKernel(theta[1]) * ExpSine2Kernel(theta[2], theta[3])
gp = george.GaussianProcess(k, nleaf=200)
err1 = np.zeros(len(y1))
err2 = np.zeros(len(y2))
err1 = theta[4]
err2 = theta[5]
err = np.append(theta[4], theta[5])
try:
gp.compute(x, err)
except (ValueError, np.linalg.LinAlgError):
return -np.inf
return gp.lnlikelihood(y)
(We don't have any direct estimate of what the photon noise should be, except that it should be roughly constant, so I'm letting it vary with no assumptions except for that one).
And my values for the various thetas are roughly ([-7.5, 2.3, -4.0, 3.0, -9.4, -7.1]). It appears to model the data well when I use gp.predict.
When I use emcee with this, if I initialize my walkers all at the same place, my returned log likelihood (added to the RV log likelihood) walks around between 3580 and 3581; changing nleaf causes it to step around by 0.01, which is acceptable to me.
I can send you my data if you want to take a look at it specifically.
Seed will not be useful in an MCMC, correct? Small errors will still occur, they'll just occur deterministically which won't help when I try to marginalize over the noise parameters.
I'm closing this as "stale". If anyone runs into an issue like this, please feel free to re-open.
Officially opening an issue, per earlier conversation:
The following code:
returns
It's interesting that there is the same result returned multiple times, making me think that it's some small random number that's propagating through the matrix operations (although that's just a guess...). Changing tol or nleaf doesn't seem to affect the results.