SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.04k stars 562 forks source link

Uniform prior broken #639

Open cwlgadd opened 6 years ago

cwlgadd commented 6 years ago

Calling HMC with a Uniform prior gives the wrong results. A simple example showing the behaviour I experienced:

import GPy
import numpy as np
import matplotlib.pyplot as plt

if __name__ == '__main__':
    X = np.reshape(np.linspace(0,2*np.pi,100),(-1,1))
    Y = np.reshape(np.cos(X[:, 0]), (-1,1)) + 0.05*np.random.normal(size=(100,1))

    GPmodel = GPy.models.GPRegression(X,Y, kernel=GPy.kern.RBF(X.shape[1], ARD=True))
    prior = GPy.priors.Uniform(0,1)     #GPy.priors.Gamma(1, 1)

    for p in range(X.shape[1]):
        GPmodel.kern.lengthscale[[p]].set_prior(prior, warning=False)
        GPmodel.kern.lengthscale[[p]] = 0.5             # start within the prior's support
        print(GPmodel.kern.lengthscale)

    hmc = GPy.inference.mcmc.HMC(GPmodel, stepsize=5e-2)
    shmc = hmc.sample(num_samples=1000)
    print(shmc.shape)

    for i in range(shmc.shape[1]):
        plt.scatter(np.linspace(1,shmc.shape[0]-1, shmc.shape[0]), shmc[:,i])
        plt.show()

    plt.hist(prior.rvs(10000), bins=100, normed=True)
    plt.hist(shmc[:,1], bins=50, normed=True)
    plt.show()

This gives posterior samples outside the support of the prior, where there should be no density. In the plot, prior samples are in blue whilst posterior samples are in orange. I tried it with and without ARD and got the same results.

gpy_issue

I was using Python 3.6.1 and GPy version 1.9.2

cwlgadd commented 6 years ago

Also another bug, which is very likely the cause, is evident when plotting the prior:

import GPy
import matplotlib.pyplot as plt

if __name__ == '__main__':
    prior = GPy.priors.Uniform(0,1) 
    prior.plot()
    plt.show()

gpy_issue2

cwlgadd commented 6 years ago

The cause of the bug is:

    def lnpdf(self, x):
        region = (x >= self.lower) * (x <= self.upper)
        return region

where you assign log density within the bounds as 1 and outside as 0. This gives density e^1 within the bounds and and e^0 outside.