JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
267 stars 32 forks source link

Should I make the kernels in GraphKernels.jl subtypes of KernelFunctions.Kernel? #308

Open simonschoelly opened 3 years ago

simonschoelly commented 3 years ago

I have started working on a package called GraphKernels.jl, that implements kernel functions between graphs. So far I have deliberately used similar function names as this packge, for example kernelmatrix, Now I wanted to discuss if it would make sense to make all these graph kernels subtypes of Kernel from this package?

There are a few things, that might be special for graph kernels:

On the other hand, it would be nice to have a consistent interface for kernels, and to be able to use the functions from this package. Maybe there will also be some general kernel interface in MLJ at some point.

And maybe the multi thread support could be moved to this package at some point, there might be data types that could benefit from faster kernel matrix constructions (images and strings for example).

So what is your opinion on that?

willtebbutt commented 3 years ago

Hi @simonschoelly .

Please do consider making your graph kernels subtype KernelFunctions.jl's Kernel! I would love to be able to just plug your kernels into a GP from AbstractGPs.

Calculating the kernel function can be quite slow - therefore it is important that I can calculate the kernel matrix with multiple threads and also relay on the symmetry so I will not have to calculate both K(g1, g2) and K(g2, g1).

And maybe the multi thread support could be moved to this package at some point, there might be data types that could benefit from faster kernel matrix constructions (images and strings for example).

As the kernel implementer, you have complete control over how kernelmatrix etc are implemented, so I don't forsee that being an issue, unless I'm missing something?

I am not an expert in automatic differentiation, but so far, it looks quite difficult to make these kernels differentiable, expect in a few parameters

That's also fine -- we don't stipulate that your kernels must be AD friendly, it's just a bonus that they are if they can be.

In particular, our standard test suite (which you can utilise in your tests) doesn't assume that a given kernel is differentiable: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/efca34f298277cf464a841a1ee8acf4e256871c1/src/test_utils.jl#L32

Some kernels relay on randomizes walks, so I somehow have to figure out how to achieve consistent results there when providing a seed

Oh, interesting. Yeah, you'd probably need to ensure consistent results by manually seeding your kernels if you want to use our test suite.

A lot of graph kernels are of the form K(f(g), f(g2)) where f is a function that transforms a single graph into something else (for example a vertex embedding), before these transformed values are passed to the kernel. By caching the output of f, one can reduce the amount of calculations significantly when constructing a kernel matrix.

This is an interesting one. I guess my immediate thought would be that, if the embedding is fixed, could you not just apply it to the graphs before passing them into anything involving a kernel? i.e. do you even need a mechanism for caching built into the kernel framework when you could just pass in the embeddings?

Or do the embeddings change, and it's that you want to cache computations inside a single kernelmatrix evaluation? If it's this, then you should get that for free with our input transformations: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/master/src/transform/functiontransform.jl

On the other hand, it would be nice to have a consistent interface for kernels, and to be able to use the functions from this package. Maybe there will also be some general kernel interface in MLJ at some point.

We're definitely aiming to have a generic kernel interface -- it's not clear to me that an additional one in MLJ should be necessary. Are there any things that you've encountered in KernelFunctions.jl that have caused you problems in your package?

simonschoelly commented 3 years ago

Thanks for your answers, I have now added the dependency on KernelFunctions, test_interface seem to work very well.

As the kernel implementer, you have complete control over how kernelmatrix etc are implemented, so I don't forsee that being an issue, unless I'm missing something? This is indeed true, there is a bit of an issue if one combines different kernels for example by adding them, but as long outer kernel still calls kernelmatrix on the inner kernels, this might not make the code much slower.

This is an interesting one. I guess my immediate thought would be that, if the embedding is fixed, could you not just apply it to the graphs before passing them into anything involving a kernel? i.e. do you even need a mechanism for caching built into the kernel framework when you could just pass in the embeddings?

Or do the embeddings change, and it's that you want to cache computations inside a single kernelmatrix evaluation? If it's this, then you should get that for free with our input transformations: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/master/src/transform/functiontransform.jl

The embedding was more of an example, usually there is more involved in this step. I would also like to not but bother the user with having to think about that, so I don't think transformations are the solution here. What I have now is two functions (maybe not with the best names) called perprocessed(kernel, graph1, graph2)_form and apply_preprocessed(kernel, x1, x2) and a function for for abstract graph kernels

function (kernel::AbstractGraphKernel)(g1::AbstractGraph, g2::AbstractGraph)

    return apply_preprocessed(kernel, preprocessed_form(kernel, g1), preprocessed_form(kernel, g2))
end

The idea here is that one that implements a graph kernel then just implements these two functions. Other functions such as kernelmatrix also work with these two functions.

We're definitely aiming to have a generic kernel interface -- it's not clear to me that an additional one in MLJ should be necessary. Are there any things that you've encountered in KernelFunctions.jl that have caused you problems in your package?

I did not have any problems, but as far as I know if someone wants to use some machine learning algorithm such as a support vector machine or kernel pca, one is expected to calculate the kernel matrix manually. Maybe it would be nice, if one could just specify a kernel there, and then somehow the calculation of the kernel matrix gets somehow added to the MLJ pipeline automatically.

willtebbutt commented 3 years ago

The idea here is that one that implements a graph kernel then just implements these two functions. Other functions such as kernelmatrix also work with these two functions

I see -- could you point me towards a concrete example in your code? I'm interested to see how this looks in practice.

I did not have any problems, but as far as I know if someone wants to use some machine learning algorithm such as a support vector machine or kernel pca, one is expected to calculate the kernel matrix manually. Maybe it would be nice, if one could just specify a kernel there, and then somehow the calculation of the kernel matrix gets somehow added to the MLJ pipeline automatically.

Hmm, do you have an example? I'm a bit confused by this. For example, in AbstractGPs.jl we do just require that the user specify the kernel, and then AbstractGPs handles evaluating it when necessary -- it's definitely our intention that packages that use kernels (such as something doing kernel PCA or building an SVM) would just ask the user to provide a kernel, and then those packages would use the kernel internally.