Test VFE with the naive implementation

sharanry commented 2 years ago

Summary

We currently lack any test to confirm the predictive distribution is matching what is prescribed by the original papers -

VFE: M. K. Titsias. "Variational learning of inducing variables in sparse Gaussian processes". In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics. 2009.
DTC: M. Seeger, C. K. I. Williams and N. D. Lawrence. "Fast Forward Selection to Speed Up Sparse Gaussian Process Regression". In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. 2003

This is an attempt at fixing that.

The predictive distribution for both DTC and VFE should be the same -- projected process (PP)? This PR is currently checking it against DTC's predictive distribution defined in Quinonero-Candela's Eq 20b

sharanry commented 2 years ago

With these tests, we can identify a subjectively small but consistent discrepancy between my computationally naive test implementation and the package's implementation. Not sure if it is a bug in the test.

julia> inv(LinearAlgebra.Diagonal(1e-12 * ones(5))) *
                kernelmatrix(k, x_test, u) *
                Σ(x, u) *
                kernelmatrix(k, u, x) *
                y
5-element Vector{Float64}:
  2.2822711905989306
  1.619419315942234
  2.4886061448883043
  0.13716866592873223
 -1.4776904851124237

julia> mean(f_approx_post, x_test)
5-element Vector{Float64}:
  2.2632581110918366
  1.620565927921467
  2.540609342124296
  0.16377806765055059
 -1.6899264788845625

julia> kernelmatrix(k, x_test, x_test) - q(x_test, x_test) +
                 kernelmatrix(k, x_test, u) * Σ(x, u) * transpose(kernelmatrix(k, x_test, u))          
5×5 Matrix{Float64}:
  0.148276     0.0509091   -0.00913296  -0.045995    0.0149131
  0.0506096    0.0490361   -0.00308661  -0.0382136   0.0114484
 -0.00913856  -0.00307898   0.0389262    0.0023531  -0.000731468
 -0.046091    -0.038105     0.00236565   0.131079   -0.058772
  0.0148782    0.0114382   -0.00071384  -0.0587383   0.308568

julia> cov(f_approx_post, x_test)
5×5 Matrix{Float64}:
  0.150214     0.0511708   -0.00932485   -0.0470627    0.0133652
  0.0511708    0.0493259   -0.00309447   -0.0388128    0.0106505
 -0.00932485  -0.00309447   0.0389497     0.00241128  -0.000594857
 -0.0470627   -0.0388128    0.00241128    0.132985    -0.0552457
  0.0133652    0.0106505   -0.000594857  -0.0552457    0.300714

willtebbutt commented 2 years ago

With these tests, we can identify a subjectively small but consistent discrepancy between my computationally naive test implementation and the package's implementation. Not sure if it is a bug in the test.

I'm pretty sure that your mean calculation doesn't take the fact that the prior has non-zero mean into account. I think you need something like

    @test map(sin, x_test) + inv(LinearAlgebra.Diagonal(1e-12 * ones(5))) *
          kernelmatrix(k, x_test, u) *
          Σ(x, u) *
          kernelmatrix(k, u, x) *
          (y - map(sin, x)) ≈ mean(f_approx_post, x_test)

instead.

As regards the covariance, note that the predictive covariance at locations other than the pseudo-inputs are different between the VFE and DTC. Could you check that your expressions agree with what the package currently does if you check that x_test = z. I think they should also agree if you swap out VFE for DTC in the tests.

I would advise checking that what you've implemented for the covariance in your tests lines up with equation 6 in Titsias' 2009 paper -- a quick glance on my part suggests that they're not quite the same, but I've not checked it in detail.

sharanry commented 2 years ago

@willtebbutt Also, are derivations for the current VFE/DTC predictive distribution internal code available somewhere? I still have your write-up which you gave me couple of years back while implementing these but that document don't seem to have any derivations.

willtebbutt commented 2 years ago

Hmmm I actually don't think that we do have that lying around. Could you open an issue about it so that we don't forget it?

sharanry commented 2 years ago

Hmmm I actually don't think that we do have that lying around. Could you open an issue about it so that we don't forget it?

Oh okay. Please let me know if you come across them. I was hoping to use those as a reference to implement other sparse techniques like FITC, etc.

st-- commented 2 years ago

In the new tests, could you separate out the terms, call it e.g. dtc_posterior_mean = ... # see (ref) (eq. X)? that'd be really helpful to make it easier to follow :)

willtebbutt commented 1 year ago

This appears to have gone stale. @sharanry please feel free to re-open if you wish to finish off.

JuliaGaussianProcesses / AbstractGPs.jl

Test VFE with the naive implementation #308