SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.02k stars 560 forks source link

LinAlgError("not positive definite, even with jitter.") #660

Open glorwlin opened 6 years ago

glorwlin commented 6 years ago

Hi,

I have a weird problem while using GPy in HPC (Sharc). When I tried to run a deep GP using a very small training dataset (500 points), GPy kept giving me the following error during the optimization:

raise linalg.LinAlgError("not positive definite, even with jitter.")

However, the exact same code worked well with my local computer. The error seemed to appear after the 9/30 optimization run for my "train level two":

    ''' Train level 2 '''
    XX = np.hstack((X2, mu1))

    k2 = GPy.kern.RBF(1, active_dims = [dim])*GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True) \
        + GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True)

    # k_rho = GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True)
    # k_f_t-1 = GPy.kern.RBF(1, active_dims = [dim])
    # k_delta = GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True)

    m2 = GPy.models.GPRegression(X=XX, Y=Y2, kernel=k2,normalizer=True)

    m2[".*Gaussian_noise"] = m2.Y.var()*0.01
    m2[".*Gaussian_noise"].fix()

    m2.optimize(max_iters = 500)

    m2[".*Gaussian_noise"].unfix()
    m2[".*Gaussian_noise"].constrain_positive()

    m2.optimize_restarts(30, optimizer = "bfgs",  max_iters = 1000)

I have checked the version of GPy in both my computer and the HPC environment - they are both the latest version. What could be the potential cause of such problem?

Here I attach the entire code that I use:


dim = 4
active_dimensions = np.arange(0,dim)

''' Train level 1 '''
k1 = GPy.kern.RBF(dim, ARD = True)
m1 = GPy.models.GPRegression(X=X1, Y=Y1, kernel=k1,normalizer=True)

m1[".*Gaussian_noise"] = m1.Y.var()*0.01
m1[".*Gaussian_noise"].fix()

m1.optimize(max_iters = 500)

m1[".*Gaussian_noise"].unfix()
m1[".*Gaussian_noise"].constrain_positive()

m1.optimize_restarts(30, optimizer = "bfgs",  max_iters = 1000)

mu1, v1 = m1.predict(X2)

''' Train level 2 '''
XX = np.hstack((X2, mu1))

k2 = GPy.kern.RBF(1, active_dims = [dim])*GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True) \
    + GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True)

# k_rho = GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True)
# k_f_t-1 = GPy.kern.RBF(1, active_dims = [dim])
# k_delta = GPy.kern.RBF(dim, active_dims = active_dimensions, ARD = True)

m2 = GPy.models.GPRegression(X=XX, Y=Y2, kernel=k2,normalizer=True)

m2[".*Gaussian_noise"] = m2.Y.var()*0.01
m2[".*Gaussian_noise"].fix()

m2.optimize(max_iters = 500)

m2[".*Gaussian_noise"].unfix()
m2[".*Gaussian_noise"].constrain_positive()

m2.optimize_restarts(30, optimizer = "bfgs",  max_iters = 1000)

# Predict at test points
nsamples = N1
ntest = X_test.shape[0]
mu0, C0 = m1.predict(X_test, full_cov=True) # memory error
Z = np.random.multivariate_normal(mu0.flatten(),C0,nsamples)
tmp_m = np.zeros((nsamples,ntest))
tmp_v = np.zeros((nsamples,ntest))

# push samples through f_2
for i in range(0,nsamples):
    mu, v = m2.predict(np.hstack((X_test, Z[i,:][:,None]))) # memory error
    tmp_m[i,:] = mu.flatten()
    tmp_v[i,:] = v.flatten()

# get mean and variance at X3
mu2 = np.mean(tmp_m, axis = 0)
v2 = np.mean(tmp_v, axis = 0) + np.var(tmp_m, axis = 0)
mu2 = mu2[:,None]
v2 = np.abs(v2[:,None])
zhengq01 commented 6 years ago

I am also using this code to build multi-fidelity GP and encounter the same problem. My data has 30 dimensions and I build GP using each dimension data independently. Unfortunately, the LinAlgError occurs at random dimension even though the data has not changed. I am so confused about that. Have you fixed the problem?

glorwlin commented 6 years ago

I am also using this code to build multi-fidelity GP and encounter the same problem. My data has 30 dimensions and I build GP using each dimension data independently. Unfortunately, the LinAlgError occurs at random dimension even though the data has not changed. I am so confused about that. Have you fixed the problem?

Unfortunately, no. I switched to another package in the end. Sorry that I could not help.

zhengq01 commented 6 years ago

I am also using this code to build multi-fidelity GP and encounter the same problem. My data has 30 dimensions and I build GP using each dimension data independently. Unfortunately, the LinAlgError occurs at random dimension even though the data has not changed. I am so confused about that. Have you fixed the problem?

Unfortunately, no. I switched to another package in the end. Sorry that I could not help.

Do you mind telling me which package did you switched to? Maybe I should have a try. Thanks a lot!

glorwlin commented 6 years ago

Hi, I ended up translating the codes into GPflow and continued everything from there.

zhengq01 commented 6 years ago

Hi, I ended up translating the codes into GPflow and continued everything from there.

Thank you!

zhengq01 commented 6 years ago

Hi, I ended up translating the codes into GPflow and continued everything from there.

Do you mind sharing me with the tutorial code for multi-fidelity GP (or deep GP) using GPflow. My email is zheng992328@163.com. Thank you very much!

wmmxk commented 5 years ago

I run into this same issue. Based on the discussion above, switching to GPflow solves the problem. Could anyone point me to some tutorial? Thanks, Xiaokang

huibinshen commented 5 years ago

One workaround is to set the maxtries to higher number such as 10 in the jitchol method (GPy/util/linalg.py). If it still happens, catch the exception and return a random candidate can be also fine.

samlaf commented 5 years ago

Also happening to my code too often for me to overlook it. By returning a random candidate, you mean returning an entire random matrix L? Or where exactly would you catch this exception? Example code would be helpful!

1kastner commented 3 years ago

Initially things were working fine for me and only when I started to prohibit drawing certain points using the ignore_x parameter I ran into these troubles. I used some randomness here for testing and for mathematical reasons now I see it was not a good idea.

@icdishb thank you very much for your hint. For my instance, this did not solve the problem for me. But still, could it be possible to move the maxtries parameter up so that I don't need to do monkey patching? Btw, this was my temporary solution that worked according to the stack trace:

# see https://github.com/SheffieldML/GPy/issues/660
import GPy
previous_jitchol = GPy.util.linalg.jitchol
GPy.util.linalg.jitchol = lambda A: previous_jitchol(A, maxtries=100)
mpvanderschelling commented 3 years ago

I experienced the same LinAlg-error. Changing the maxtries=100 didn't help much. I changed the L = linalg.cholesky(A + np.eye(A.shape[0]) * jitter, lower=True), which is using the scipy.linalg.cholesky() function to it's numpy-equivalent np.linalg.cholesky(A + np.eye(A.shape[0]) * jitter). This resolved the issue for me

yiminzme commented 3 years ago

I experienced the same LinAlg-error. Changing the maxtries=100 didn't help much. I changed the L = linalg.cholesky(A + np.eye(A.shape[0]) * jitter, lower=True), which is using the scipy.linalg.cholesky() function to it's numpy-equivalent np.linalg.cholesky(A + np.eye(A.shape[0]) * jitter). This resolved the issue for me

This works for me. Specifically, I done the following steps: 1.) Download this Github repository and unzip it 2.) In the root folder, run pip install -e .\ 3.) Open file GPy/util/linalg.py, in function jitchol(), change L = linalg.cholesky(A + np.eye(A.shape[0]) * jitter, lower=True) to L = np.linalg.cholesky(A + np.eye(A.shape[0]) * jitter)

ballaln commented 1 year ago

Normalizing outputs or adding Gaussian noise could help.