Open willtebbutt opened 4 years ago
Hi @willtebbutt , the examples works well, thanks ;)
have I covered all of the functionality of interest to GPFlux?
Not all, I'm still extending it. The next main feature is to make GP in GPFlux works as a normal neural network layer ( I'm currently working on it ), which I think may support e.g. Deep Gauss Process ( Neil Lawrence ) and Variational Gauss process ( David Blei ). ( hope I understand the idea of these papers correctly )
does this miss any important interface requirements?
For these two functionalities, I think you have provided enough interfaces, it's good.
For Flux integration, I personally like the second implementation you provided:
dσ², dg = Zygote.gradient(
function(σ², g)
# Manually transform data
gx = ColVecs(g(x.X))
# Construct GP and compute marginal likelihood using transformed data
f = σ² * GP(eq(), GPC())
fx = f(gx, 0.1)
return logpdf(fx, y)
end,
σ², g,
)
I think it's more clear and straightforward, this is also the way used in GPFlux
.
The next main feature is to make GP in GPFlux works as a normal neural network layer
Could you elaborate a little on what this will look like?
I found there are related issues here, this can be realized in NKN. Also sparse GP mentioned here is considered to be added to GPFlux in future.
I actually don't think that the linear combination of kernels thing is the right way to go about implementing the NKN
. As I showed on the branch linked above, I think the right way is probably to have a custom NKN kernel that accepts a collection of primitive kernels, and a Chain
(or some other Flux
construct) that contains them).
I think it's more clear and straightforward, this is also the way used in GPFlux.
Good to know -- this approach just works out of the box, so there's literally no need to provide explicit integration in Flux to make this work -- I just need to implement worked examples :)
I think the right way is probably to have a custom NKN kernel that accepts a collection of primitive kernels, and a Chain (or some other Flux construct) that contains them).
I agree on that, in fact, GPFlux
implement a NeuralKernelNetwork
type to support NKN, which slightly modifies Flux
's Chain
.
struct NeuralKernelNetwork{T<:Tuple} <: AbstractKernel
layers::T
NeuralKernelNetwork{T}(ls...) where {T} = new{T}(ls)
end
I actually don't think that the linear combination of kernels thing is the right way to go about implementing the NKN
Maybe I don't make myself clear, here I mean most composite kernels can be viewed as special cases of NKN ( linear combination of kernels can also be viewed as a NKN that only has linear layer, weights of the linear layer is the same as coefficients infront of kernels ). In GPFlux
, I use NKN as backend for addition kernel and product kernel.
const ProductCompositeKernel = NeuralKernelNetwork{Tuple{Primitive, typeof(allProduct)}}
const AddCompositeKernel = NeuralKernelNetwork{Tuple{Primitive, typeof(allSum)}}
Could you elaborate a little on what this will look like?
This idea comes from the fact that GP is equivalent to a neural network layer which has infinitely many neurons ( some constraints are needed for the weights of this layer ) ( this is indicated by Neal in 1994, proof can be found here in section 2 ). Give me some time, I will try to provide you an example this week :)
Maybe I don't make myself clear, here I mean most composite kernels can be viewed as special cases of NKN ( linear combination of kernels can also be viewed as a NKN that only has linear layer, weights of the linear layer is the same as coefficients infront of kernels ). In GPFlux, I use NKN as backend for addition kernel and product kernel.
Good points. I wonder whether there are any performance implications associated with this though... hmmm.
This idea comes from the fact that GP is equivalent to a neural network layer which has infinitely many neurons ( some constraints are needed for the weights of this layer ) ( this is indicated by Neal in 1994, proof can be found here in section 2 ). Give me some time, I will try to provide you an example this week :)
Ah, I see -- this line of work is slightly different from the Deep GP or variational GP stuff, so I'll be interested to see what you come up with :) Its not clear to me how this will play with the usual Flux way of doing this, in particular how it plays with distributions over functions rather than the deterministic objects that Flux works with.
Hi @willtebbutt , I just finish some initial work on Gaussian process layer we discussed last week, implementation can be found in this notebook on this branch. I strongly suggest you to run this notebook by:
git clone git@github.com:HamletWantToCode/GPFlux.jl.git
git checkout develop
add
with the location of the GPFlux folderAh I see. Looks like a nice API to aim for -- definitely needs some way to perform inference though as you're currently just sampling from the prior each time the function is evaluated.
As regards testing with finite differencing -- you just have to be really careful to use exactly the same seed each time you evaluate the function. In particular, you should deterministically set the seed inside the function.
I've added a Stheno.jl version of this proposal here for reference - the point being that Stheno.jl can do all of this stuff with minimal modification.
Excellent, it's great to know that Stheno
has built-in support for this :)
Based on our previous discussion, I think we both agree that this Flux integration should include:
AFAIK, these functionalities aren't included in Julia community. In Python, Gpytorch supports the first one, and Tensorflow recently has the last one supported in it's extension ( Pyro may support both ). I'd like to have these APIs integrated in Stheno
.
Excellent. Let's tackle point number 1 first then, as I think it's likely the most straightforward in the sense that there's no real integration to be done, we just need example code and good documentation. Do you agree / what kinds of resources do you think would be helpful to address this?
Do you agree / what kinds of resources do you think would be helpful to address this?
I think recover some experiments on deep kernel learning paper is a good starting point, Gpytorch also use it as a demo. I will working on this these days.
Sounds good. Probably best to start with a small toy dataset where exact inference is tractable. You could also do something with pseudo-points quite straightforwardly -- see Stheno's elbo
function :)
Hi @willtebbutt , I have implemented a simple step function fitting example here, it use a feedforward neural network plus a GP with ARD kernel, I also write a binary classification example that use Stheno
and Turing
, examples are writen in a jupyter notebook and are packaged in a project, so it should be straightforward to get everything up and running.
I noticed Stheno
has a model zoo but it seems outdated now, so where should these examples go into ?
I have run into a problem when trying to make Stheno's output data type to be Float32
( In Flux
, the default data type for neural network parameters is Float32
, this could greatly reduce the computation cost when we have large dataset ). Below is an example to reproduce my problem:
using Stheno
X = rand(Float32, 4, 10);
y = rand(Float32, 10);
l = rand(Float32, 4);
σ² = rand(Float32);
kernel = σ²*stretch(EQ(), 1.0f0 ./ l);
K_matrix = pw(kernel, ColVecs(X)) |> eltype # Float32
gp = GP(0.0f0, kernel, GPC());
noisy_prior = gp(ColVecs(X), 0.01f0);
rand(noisy_prior) |> eltype # Float64 !!!
logpdf(noisy_prior, y) |> eltype # Float64 !!!
Though I convert all the input parameter type to Float32
, the last two line still give me Float64
type
The example looks really good @HamletWantToCode . Will take a proper look once we're satisfied that the type stability issues have been resolved.
I'm creating a new examples
folder at the minute. I've got a branch with them all on that I'll aim to get on to master in the next day or so. It'll be best if you just add a new sub-directory in there once it's available.
Right, I would say that the first item has been more-or-less completed.
Are you up for getting on with adding the Neural-Kernel-Network Kernel @HamletWantToCode ?
Yeah, here are my ideas about implementation of Neural-Kernel-Network kernel:
Kernel
type, and should have ew
& pw
interfacesChain
, three types of layer can be added to it: primitive layer ( contains basic kernels ), linear layer ( superposition of kernel ) and product layer ( kernel production ), also we have to include activation functions ( currently exp function )params
) and redistributed ( this is because Optim.jl package only allows us to write all parameters into a 1D array ).this is an example implementation of parameter redistribution function, hope I have made it clear:
function dispatch!(model, xs::AbstractVector)
loc = 1
for p in params(model)
lp = length(p)
x = reshape(xs[loc:loc+lp-1], size(p))
copyto!(p, x)
loc += lp
end
model
end
What do you think ? @willtebbutt
I think this plan sounds very reasonable.
Maybe open a PR with your plan from above and we can discuss further on that?
Again, didn't mean to close...
I wonder whether we could just directly use Flux's chain type here?
I also consider using it, but there maybe some performance related problem, I'll write something first and then we can discuss it explicitly.
Maybe open a PR with your plan from above and we can discuss further on that ?
I'll open a PR once I finish it :)
Great job with the Neural Kernel Network implementation @HamletWantToCode . I noticed that there aren't currently any of the activation function layers implemented. Would you be up for adding the basics quickly before we move on to the third item on the list?
The reason I don't include activation functions in the PR are:
exp
function which can be easily implemented by users.Personally, I'd like to include the activation functions once more of them are found and when they are proved to be useful.
Fair enough. I'm happy not to worry about them for the time being -- as you say, we can always add them later if the need arises.
Relates to this issue.
A MWE an be found in the
examples
directory on this branch. It contains a basic demo ofFlux
model with aStheno
model (in the literal sense of composition)The examples are all packaged nicely in a project, so it should be straightforward to get everything up and running.
The questions now are
@HamletWantToCode what do you think?