dfm / george

Fast and flexible Gaussian Process regression in Python
http://george.readthedocs.io
MIT License
445 stars 128 forks source link

Same kernel returns different results with same data #7

Closed benmontet closed 7 years ago

benmontet commented 10 years ago

Officially opening an issue, per earlier conversation:

The following code:

import numpy as np
import george
from george.kernels import ExpSquaredKernel

a = 0.1
s = 0.5

x = np.linspace(0, 1000, 10000)
y = np.random.random(10000)*0.1 + 0.5

yerr = 0.1*np.ones(10000)
m = 0.5*np.ones(10000)

for i in xrange(10):
    kernel = a * ExpSquaredKernel(s)
    gp = george.GaussianProcess(kernel, tol=1e-12, nleaf=100)
    gp.compute(x, yerr)
    print gp.lnlikelihood(y-m)

returns

10124.3121797
10125.0028425
10125.1186467
10125.332759
10125.332759
10125.332759
10125.2206463
10124.5410274
10124.5410274
10123.6107036

It's interesting that there is the same result returned multiple times, making me think that it's some small random number that's propagating through the matrix operations (although that's just a guess...). Changing tol or nleaf doesn't seem to affect the results.

dfm commented 10 years ago

This is a problem because there is a random step in the HODLR solver's compute step. In the most recent version, your example can be coded to get deterministic results as follows:

import numpy as np
import george
from george.kernels import ExpSquaredKernel

a = 0.1
s = 0.5

x = np.linspace(0, 1000, 10000)
y = np.random.random(10000)*0.1 + 0.5

yerr = 0.1*np.ones(10000)
m = 0.5*np.ones(10000)

kernel = a * ExpSquaredKernel(s)
gp = george.GP(kernel, solver=george.HODLRSolver)

for i in xrange(10):
    gp.compute(x, yerr, seed=1234)
    print(gp.lnlikelihood(y-m))

I've also found that on more reasonable problems (where the GP model is actually a decent fit), this is less of a problem.

benmontet commented 9 years ago

This problem came up again in an astrophysically relevant situation (modeling a transit fit + rotation period model).

I am able to minimize its effects with judicious usage of nleaf. Specifically nleaf=200 gives sufficient precision in my case without slowing down things too much. nleaf=2000 gives nearly machine precision, but is way too slow to be useful.

dfm commented 9 years ago

Can you give a specific code + data snippet and describe how the failure propagates to bad results? Like we discussed before this tends to matter more when the kernel is really wrong (poorly conditioned)… also using the "seed" parameter can often help to get reasonable results in practice.

benmontet commented 9 years ago

Sure. My data comes from two telescopes, with different levels of photon noise, but the signal beyond the white noise should be dominated by stellar activity. I have the following code:

def GPlike(theta, x, y1, y2):
    theta = np.exp(theta)
    k = theta[0] * ExpSquaredKernel(theta[1]) * ExpSine2Kernel(theta[2], theta[3])
    gp = george.GaussianProcess(k, nleaf=200)

    err1 = np.zeros(len(y1))
    err2 = np.zeros(len(y2))

    err1 = theta[4]
    err2  = theta[5]

    err = np.append(theta[4], theta[5])

    try:
        gp.compute(x, err)
    except (ValueError, np.linalg.LinAlgError):
        return -np.inf
    return gp.lnlikelihood(y)

(We don't have any direct estimate of what the photon noise should be, except that it should be roughly constant, so I'm letting it vary with no assumptions except for that one).

And my values for the various thetas are roughly ([-7.5, 2.3, -4.0, 3.0, -9.4, -7.1]). It appears to model the data well when I use gp.predict.

When I use emcee with this, if I initialize my walkers all at the same place, my returned log likelihood (added to the RV log likelihood) walks around between 3580 and 3581; changing nleaf causes it to step around by 0.01, which is acceptable to me.

I can send you my data if you want to take a look at it specifically.

Seed will not be useful in an MCMC, correct? Small errors will still occur, they'll just occur deterministically which won't help when I try to marginalize over the noise parameters.

dfm commented 7 years ago

I'm closing this as "stale". If anyone runs into an issue like this, please feel free to re-open.