JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
267 stars 32 forks source link

Split `MOInput` into two types #306

Closed thomasgudjonwright closed 3 years ago

thomasgudjonwright commented 3 years ago

The idea would be to introduce two new types (names open for discussion), lets call them BlockByInputs and BlockByOutputs for the purpose of this issue.

BlockByOutputs would be equivalent to the current MOInput, ie, given and input x = [4.5, 5.5], we have:

> julia BlockByOutputs(x, 2)
4-element MOInput{Vector{Float64}}:
 (4.5, 1)
 (5.5, 1)
 (4.5, 2)
 (5.5, 2)

Conversely, BlockByInputs would look something like this for the same input:

> julia BlockByInputs(x, 2)
4-element MOInput{Vector{Float64}}:
 (4.5, 1)
 (4.5, 2)
 (5.5, 1)
 (5.5, 2)

The reasoning for this is that each will format the kernelmatrix differently. The LMM expects a kernelmatrix that would be constructed properly using the latter (also works by using collect(MOInput(x,2)) as input to kernelmatrix).

willtebbutt commented 3 years ago

(spoken to Tom offline about this)

To expand on what Tom is saying: there are two obvious conventions you could adopt for the ordering of your inputs to a multi-output GP. Suppose that you have P outputs and N "inputs", meaning that your complete input is of length NP. You can either

  1. have P blocks of N (MOInput), or
  2. N blocks of P (the one we're not currently doing)

Invenia's code currently utilises the latter, while JuliaGaussianProcesses presently only has support for the former.

We should support both conventions because

  1. different situations calls for different conventions, in the same way as ColVecs and RowVecs are both necessary.
  2. Invenia has lots of optimisations for the convention we don't currently support, which we really ought to be able to hook into.

I don't think this really blocks the current PR, but it'll be blocking for the next one.

willtebbutt commented 3 years ago

I quite like BlockByOutputs and BlockByInputs, but since these are both collections of inputs (in our vocab), I wonder whether now is the right time to also solve the "inputs" vs "inputs" naming issue.

i.e. for

x = randn(2)
y = MOInput(x, 3)

both are valid "inputs", but it would be helpful to be able to distinguish between x and y when referring to inputs.

I wonder whether we should call y "inputs" and x "locations". We could then say things like: "An input to a multi-ouptut GP comprises a location x and output index p"

as opposed to having to say

"An input to a multi-output GP comprises an input x and a output index p".

Under this naming convention and Tom's proposal, BlockByOutputs and BlockByLocations would be the obvious names.

theogf commented 3 years ago

Instead of location, we could use more classical terms like feature vector or input sample.

theogf commented 3 years ago

About the whole ordering shenanigans, this would also be easier if we considered 4-tensors instead :trollface:

willtebbutt commented 3 years ago

Well it might make some specific things easier (I'm actually not sure what things it would make easier), but it would make lots of things harder, including

  1. We would need to pick an ordering of the dimensions. The current approach avoids this, just doing whatever the inputs tell it to do. As a user, you can predict precisely the ordering of the elements of kernelmatrix from the ordering of the inputs.
  2. We would be adding additional complexity to the core API of the package. i.e. kernelmatrix doesn't output an AbstractMatrix anymore. The beauty of the current approach is that one can write stuff on top of KernelFunctions and assume that, regardless the kernel you pass in, you'll always get the same kind of thing back, and it's completely predictable.
  3. Additional multi-output-specific fallbacks would need to be added.
  4. We would have to write + maintain separate testing suites + docs.
  5. Users would need to know about this different convention. Currently we can genuinely promise to always follow a very small set of principles. Literally everything we do in this package is just ensuring that everything is consistent with kernelmatrix(k, x, y)[p, q] == k(x[p], y[q]).

It really just comes down to the fact that we would be privelidging multi-output GPs, which I think is fundamentally the wrong way to go about this. IMHO it's really important for our ability to easily build stuff on top of KernelFunctions / AbstractGPs that we don't have to think too hard about whether a particular GP is a single-output or multi-output, unless you're doing something multi-output-specific.

edit: to be fair, we should probably lead with that in the docs. I've added this to my todo list.

willtebbutt commented 3 years ago

Instead of location, we could use more classical terms like feature vector or input sample.

These also seem like reasonable proposals -- I quite like feature vector, although it's perhaps a bit verbose.

devmotion commented 3 years ago

Perhaps just feature and BlockByFeatures? It would be shorter and features don't have to be vectors necessarily.

thomasgudjonwright commented 3 years ago

I can't seem to assign myself to this, can I only do so on my fork?

theogf commented 3 years ago

I think you need to be a member of the organization, which can be solved of course

willtebbutt commented 3 years ago

How about IsotopicByFeatures and IsotopicByOutputs? [1] (section 3.1) suggests that Isotopic is the term used in the geostatistics literature for situations in which all of the outputs have the same set of inputs. This is perhaps an improvement on Blocked / Block because it people won't know what it means, and will have to look it up. Conversely, Block could mean any number of things.

[1] - https://arxiv.org/pdf/1106.6251.pdf

edit: then a super-type for the two could be something like AbstractIsotopicVector or something.