JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
267 stars 32 forks source link

Perhaps avoid validating input dims on AbstractVector{<:AbstractVector{<:Real}} #512

Open grero opened 1 year ago

grero commented 1 year ago

I would suggest either changing this, or perhaps better, skipping it all together. https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/ef6d4591b36194fca069d8bc7ae8c1e2ee288080/src/utils.jl#L194

I'm developing kernels that work on pairs of Vector{Float64}, where the kernel essentially sums over all pairwise distances in x and y. For this kernel, the check above does not really make sense, and indeed passes if I construct the kernel matrix like this:

K = kernelmatrix(kernel, x,x)

However it fails if I instead do

K = kernelmatrix(kernel, x,y)

unless length(first(x))==length(first(y)). I realise that I can just create my own type essentially wrapping Vector{Vector{Float64}}, but that seems like unnecessary work, especially since the kernel machinery works beautifully as it is, as long as the inputs pass the validation.

willtebbutt commented 1 year ago

Ah, interesting. So you've got a case where the dimensionality of the input changes from element to element?

willtebbutt commented 1 year ago

Are you hitting the checks in the various methods in https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/master/src/matrix/kernelmatrix.jl ? I agree that they're making overly strong assumptions about input dimensionality, but I would like to check that they're actually the source of your problem.

grero commented 1 year ago

Yes, that's right. For context, I'm working on using kernel methods for analysing spike trains, which are basically sets of event timestamps. The obvious way to represent this is as a vector of vector, where each inner vector represents a spike train for one trial. Since the number of events changes from trial to trial, the dimension of the inner vectors also change.

willtebbutt commented 1 year ago

Excellent. This makes complete sense. Would you mind opening a PR which removes the checks, and adds tests which fail without the removal of the checks? I imagine we can have this sorted by the end of the day.

grero commented 1 year ago

Are you hitting the checks in the various methods in https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/master/src/matrix/kernelmatrix.jl ? I agree that they're making overly strong assumptions about input dimensionality, but I would like to check that they're actually the source of your problem.

Yes, as I indicated at the top, the validate_inputs method fails on my inputs, unless the first element of x and y happens to have the same number of items.

willtebbutt commented 1 year ago

Great -- I just wasn't sure if that was the only place where we use validate_inputs erroneously, or if there was something else.

grero commented 1 year ago

PR submitted.