JuliaGaussianProcesses / Stheno.jl

Probabilistic Programming with Gaussian processes in Julia
Other
340 stars 26 forks source link

Example with custom kernel #45

Closed MichielStock closed 4 years ago

MichielStock commented 5 years ago

The documentation is a bit hard to follow, IMO. Could you make an example where a custom kernel is created and used?

In my specific case, I would like to make a Jaccard kernel to learn over sets. This does not seem to work.

using Stheno: Kernel

jaccard(x, y) = length(intersect(x, y)) / length(union(x, y))
jaccard(x) = 1.0

struct JaccardKernel <: Kernel end

# Binary methods
function ew(k::JaccardKernel, x::AbstractVector, x′::AbstractVector)
    @assert length(x) == length(x′)
    p = length(x)
    r = zeros(p)
    for i in 1:p
        r[i] = jaccard(x[i], x′[i])
    end
    return r
end

function pw(k::JaccardKernel, x::AbstractVector, x′::AbstractVector)
     p, q = length(x), length(x′)
     R = zeros(p, q)
     for j in 1:q
         for i in 1:p
             R[i,j] = jaccard(x[i], x′[j])
         end
     end
     return R
 end
# Unary methods
ew(k::JaccardKernel, x::AbstractVector) = ones(length(x))
function pw(k::JaccardKernel, x::AbstractVector)
    p = length(x)
    R = zeros(p, q)
    for i in 1:q
        for j in 1:i
            R[i,j] = jaccard(x[i], x[j])
            R[j,i] = R[i,j]
        end
    end
    return R
end

using Stheno
using Stheno: @model

@model function model()

    # Define a smooth latent process.
    f = GP(0.0, JaccardKernel())

    # Define a latent noise process.
    ε = GP(eq())

    # Sum them to get the process of which we shall make observations.
    y = f + ε

    # Return all three processes so that we can inspect all of them.
    return f, ε, y
end

f, ε, y = model();

rand(f([[1,1], [2,1]]))

(don't mind implementation specifics)

Am I doing something fundamentally wrong here?

willtebbutt commented 5 years ago

Hi @MichielStock thanks for raising the issue. Currently working towards the AISTATS deadline, will get back to you about this tomorrow :)

willtebbutt commented 5 years ago

Just tried to run your example. I firstly realised that you've not imported ew or pw in your script, so you'll need to do that. You can first test that the JaccardKernel is working as expected with something like

using Stheno: GPC
f = GP(JaccardKernel(), GPC())
x = [[1,1], [2,1]]
rand(f(x)) # noise-free
rand(f(x, 0.1)) # noisy

I couldn't immediately get your code to work with the EQ kernel as it doesn't know how to handle inputs of type typeof(x), but we could definitely make something to make this work if that's something you're interested in doing?

edit: if you're interested in contributing the Jaccard kernel, I would very much open to a PR btw. And this might be a good candidate for an implement-your-own kernel tutorial in the docs...

MichielStock commented 5 years ago

This seems to work. I will make a page to implement your own kernel. Then we might see how to contribute more kernels for discrete structures.

willtebbutt commented 5 years ago

Glad this works for you. I've got a draft of a how-to-implement-your-own-kernel page in the works at the minute and will open a PR involving it shortly for your feedback :)

MichielStock commented 5 years ago

I think the example in the docs is an improvement.

Am I correct that the base case is always with a univariate input, i.e. pw(k::EQ, xl::AbstractVector{<:Real}, xr::AbstractVector{<:Real}), so x is a vector and is not high-dimensional by default. So if you want to regress X on y, where X is n x p, this is not supported?

(you can do it by having x a vector of feature vectors, e.g., [[1,2,3], [1,3,3], [3, 4, 2]], but this does not seem to be very elegant)

willtebbutt commented 5 years ago

Ish -- in Stheno, all inputs are subtypes of AbstractVector. The example in the docs only considers AbstractVector{<:Real}, but purely because it's the most straightforward case.

For high-dimensional inputs stored in a p x N design matrix we have the ColsAreObs struct. This ColsAreObs(X) is an AbstractVector{Vector{<:Real}} where each column of X corresponds to an input. You'll see that most of the kernels are specialised to this type.

The reason for doing this is to make it completely unambiguous how Stheno will interpret user data. It's also to avoid people complaining that the package adopts the "wrong" convention regarding whether a design matrix should be N x p or p x N -- ColsAreObs says it's p x N and we could add a RowsAreObs type with the opposite convention.

What do you think would be the most helpful way to explain this? Perhaps we should just add a section to the docs entitled Input Types.

p.s. ColsAreObs possibly isn't the best name -- if you can think of an improved name I would greatly appreciate it. Perhaps ColVectors(X) and RowVectors(X) would be better?

MichielStock commented 5 years ago

Ish -- in Stheno, all inputs are subtypes of AbstractVector. The example in the docs only considers AbstractVector{<:Real}, but purely because it's the most straightforward case.

OK

For high-dimensional inputs stored in a p x N design matrix we have the ColsAreObs struct. This ColsAreObs(X) is an AbstractVector{Vector{<:Real}} where each column of X corresponds to an input. You'll see that most of the kernels are specialised to this type.

Ah, I thought so. This is indeed a good approach because, ideally, you would like to make predictions about arbitrary abstract objects such as graphs, strings, or whatever.

The reason for doing this is to make it completely unambiguous how Stheno will interpret user data. It's also to avoid people complaining that the package adopts the "wrong" convention regarding whether a design matrix should be N x p or p x N -- ColsAreObs says it's p x N and we could add a RowsAreObs type with the opposite convention.

Well, RowsAreObs is the correct one of course ;-)

What do you think would be the most helpful way to explain this? Perhaps we should just add a section to the docs entitled Input Types.

Best add an example with high-dimensional prediction.

p.s. ColsAreObs possibly isn't the best name -- if you can think of an improved name I would greatly appreciate it. Perhaps ColVectors(X) and RowVectors(X) would be better?

ColsAreObs is good in the sense it is unambiguous. Maybe the latter is a bit more correct in nomenclature?

willtebbutt commented 5 years ago

Best add an example with high-dimensional prediction.

Sounds good. Will give this a go either tonight or tomorrow.

ColsAreObs is good in the sense it is unambiguous. Maybe the latter is a bit more correct in nomenclature?

Glad you agree that it's unambiguous. I think I'm going to refactor and go with ColVecs and RowVecs since they're a) a little bit more concise and b) stand alone from the GP / ML context a bit better i.e. ColVecs(X) just says that X is a matrix of column vectors, which doesn't have to have anything to do with statistics or ML.

willtebbutt commented 5 years ago

@MichielStock when you get a minute, could you take a look at the new dev docs and let me know if they've covered this issue sufficiently? More than happy to add extra stuff / for you to raise a PR and add extra stuff.

willtebbutt commented 4 years ago

@MichielStock is there anything left to address in this issue?

MichielStock commented 4 years ago

I think the documentations have greatly improved! Can be closed.

Only think (but might be a different issue) is using precomputed kernels. I have some example code of how I did this, but might also be some framework.

willtebbutt commented 4 years ago

Cool. Would you mind raising a separate issue about that? It's not something I've really thought about, so it would be interesting to figure out whether or not it's something that fits naturally into Stheno.