cscherrer / BayesianLinearRegression.jl

A Julia implementation of Bayesian linear regression using marginal likelihood for hyperparameter optimization, as presented in Chris Bishop's book.
Other
27 stars 0 forks source link

Positive logEvidence? #4

Closed mmattocks closed 3 years ago

mmattocks commented 3 years ago

Hello! I am confused by the output of the logEvidence function. I have the following simple linefit which gives a positive value for its log evidence:

using BayesianLinearRegression
x=[10.5,9.5,8.5,7.5,5.5,4.5,3.5,2.5,1.5,0.5]
X=hcat(ones(10),x)
y=[ 0.6981259697805094
        0.6798627414557512
        0.7731633062603211
        0.6459030047497794
        0.4453716081531055
        0.4392835256953898
        0.3579255833452043
        0.31474369225639115
        0.2577451424537498
        0.01592532634392024]
m=fit!(BayesianLinReg(X,y))

returns

BayesianLinReg model

Log evidence: 7.56
Prior scale:  0.08
Noise scale:  0.08

Coefficients:
  1:  0.09 ± 0.04
  2:  0.07 ± 0.01

This looks correct to me other than the positive value for logZ- if I understand correctly, this is log p(D) and should therefore be a negative value. The example given in the readme gives a reasonable-looking negative value. Looking at the evidence calculation function, it appears to be the correct Bishop formula. Am I misunderstanding something or is this a bug?

mmattocks commented 3 years ago

I'm not very familiar with linear algebra, but I've spent a couple nights going slowly over this code now and I can't figure this out at all. Every part of the evidence calculation seems to me to be numerically correct and equivalent to other formulations that Bishop offers in the text. I can't figure out what would actually guarantee that the log evidence approximation evaluates <=0 from the textbook. If anyone has any clues I'd greatly appreciate it.

cscherrer commented 3 years ago

Hi @mmattocks ,

Discrete probabilities have to add to one, so probabilities cannot be greater than one. But for continuous distributions, the density needs to integrate to one, so there's no such bound.

Here's a little example:

julia> using Distributions, MonteCarloMeasurements

julia> z = Particles(Normal())     # so σ*z ~ Normal(0,σ)
Particles{Float64,2000}
 1.11022e-17 ± 1.0

julia> [σ => logpdf(Normal(0,σ), σ*z) for σ in 10.0 .^ [-3:3;]]
7-element Array{Pair{Float64,Particles{Float64,2000}},1}:
  0.001 => 5.49 ± 0.7
   0.01 => 3.19 ± 0.7
    0.1 => 0.884 ± 0.7
    1.0 => -1.42 ± 0.7
   10.0 => -3.72 ± 0.7
  100.0 => -6.02 ± 0.7
 1000.0 => -8.33 ± 0.7
mmattocks commented 3 years ago

Thank you very much for the explanation, I greatly appreciate you taking the time to teach me here! I thought I understood the distinction you illustrate, but I still made the error of trying to understand the evidence approximation by analogy to nested sampling of models whose likelihoods are determined by categorical distributions :sweat: Apologies for the erroneous issue!

cscherrer commented 3 years ago

No worries, little things like this trip us all up from time to time :)