Log-likelihood is lower after fitting the process?

filipporemonato commented 9 months ago

I obtain a much lower (several orders of magnitude) log-likelihood after fitting the process to the data, even though the process seems to do a fit of the data correctly from the plots. I expected the log-likelihood of the train data to increase after the GP is fitted to them.

This is exactly the same behaviour as in your doc's example: https://juliagaussianprocesses.github.io/AbstractGPs.jl/stable/examples/0-intro-1d/

Note that your log-likelihood is -25 for the un-fitted GP, and then becomes -232 after fitting.

Since the log-likelihood is, by its name, the log of a probability, and probabilities are in [0,1], the log-likelihood after fitting should be closer to 0 = log(1), instead of becoming even more negative.

willtebbutt commented 9 months ago

Thanks for opening this. This is partly (kind of) a typo in the script, and partly that these two quantities are slightly different things. To replicate:

using AbstractGPs, Random

x = [
    0.8658165855998895,
    0.6661700880180962,
    0.8049218148148531,
    0.7714303440386239,
    0.14790478354654835,
    0.8666105548197428,
    0.007044577166530286,
    0.026331737288148638,
    0.17188596617099916,
    0.8897812990554013,
    0.24323574561119998,
    0.028590102134105955,
]
y = [
    1.5255314337144372,
    3.6434202968230003,
    3.010885733911661,
    3.774442382979625,
    3.3687639483798324,
    1.5506452040608503,
    3.790447985799683,
    3.8689707574953,
    3.4933565751758713,
    1.4284538820635841,
    3.8715350915692364,
    3.7045949061144983,
]

x_train = x[1:8]
y_train = y[1:8]
x_test = x[9:end]
y_test = y[9:end]

f = GP(Matern52Kernel())

fx = f(x_train, 0.1)
logpdf(fx, y_train)

p_fx = posterior(fx, y_train)
logpdf(p_fx(x_test), y_test)

Firstly, the last statement here should really have the noise term in it:

logpdf(p_fx(x_test, 0.1), y_test)

Secondly, the log joint probabilities of all the training data and all of the test data aren't comparable quantities, because there's different amounts of data in each data set, and they're quite different subsets of the data. We're better off comparing the logpdf of the train set under the prior / posterior, and test set under the prior / posterior:

julia> logpdf(fx, y_train)
-25.53057444906228

julia> logpdf(p_fx(x_train, 0.1), y_train)
-13.443991647513347

julia> logpdf(f(x_test, 0.1), y_test)
-9.910311949957668

julia> logpdf(p_fx(x_test, 0.1), y_test)
-2.878533694097382

Note that the log marginal probability of the test data under the prior is larger than the log marginal probability of the training data under the posterior -- this is completely fine.

The marginal distributions ought to be more comparable, although there is still some variation due to the small sample sizes:

julia> mean(logpdf.(marginals(fx), y_train))
-5.6258610983253305

julia> mean(logpdf.(marginals(p_fx(x_train, 0.1)), y_train))
-1.472024445615529

julia> mean(logpdf.(marginals(f(x_test, 0.1)), y_test))
-5.848051354716894

julia> mean(logpdf.(marginals(p_fx(x_test, 0.1)), y_test))
-0.7081250592546751

Since the log-likelihood is, by its name, the log of a probability, and probabilities are in [0,1], the log-likelihood after fitting should be closer to 0 = log(1), instead of becoming even more negative.

Note that this isn't quite correct. We're dealing with probability densities here, which take values in [0, Inf), so the logpdf can take any real value.

willtebbutt commented 9 months ago

I've opened a PR to fix the noise-variance problem: https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/pull/379

filipporemonato commented 9 months ago

Hey, thanks a lot @willtebbutt for the quick and very detailed answer! It's really useful.

Firstly, the last statement here should really have the noise term in it:

logpdf(p_fx(x_test, 0.1), y_test)

Thank you, it hadn't crossed my mind that I could (and should) input the noise in the posterior term. It makes sense since the GP is based on that noise.

We're better off comparing the logpdf of the train set under the prior / posterior, and test set under the prior / posterior

Yes, sorry if I have been somewhat unclear in my first comment, but this is what I was doing, as I agree the train/test data are not comparable. I attach a screenshot of what my little test looked like. There you can see how the logpdf was -284 on the (prior) train data, and became almost -14k on the posterior for the same data. Including the noise as you suggested, gave a logpdf of -232, which is larger than the starting -284, so that is good, what I expected.

Since the log-likelihood is, by its name, the log of a probability, and probabilities are in [0,1], the log-likelihood after fitting should be closer to 0 = log(1), instead of becoming even more negative.

Note that this isn't quite correct. We're dealing with probability densities here, which take values in [0, Inf), so the logpdf can take any real value.

You are of course completely right here. I'll chalk this error off to opening the issue early in the morning without having had breakfast before, otherwise I'd have to burn my degree, eheh.

I think the tip on inserting the noise in the posterior when calculating the logpdf is the step I missed here, and it resolves my doubt. Thank you!

willtebbutt commented 9 months ago

You are of course completely right here. I'll chalk this error off to opening the issue early in the morning without having had breakfast before, otherwise I'd have to burn my degree, eheh.

haha we've all been there!

Glad that this helped. Please don't hesitate to open issues with any more questions you may have!

edit: note that I'll close this issue once my PR has been merged.

simsurace commented 7 months ago

Closed by #379

JuliaGaussianProcesses / AbstractGPs.jl

Log-likelihood is lower after fitting the process? #378