getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 63 forks source link

[Question] Vector/MultiOutput Kernels #156

Open arthus701 opened 3 years ago

arthus701 commented 3 years ago

Hi everyone and thank you for your great package!

I'm trying to implement a MultiOutput Kernel, to be used in a gpytorch setting, but I fail at constructing the matrix with LazyTensors. I'm dealing with vector-observations modeled by a GP, so that N inputs correspond to D*N outputs and a D*N x D*N covariance matrix, where D is the vector dimension (in my case 3).

I managed to implement this in pytorch using broadcasting (i.e. constructing tensors of shape [D,N,D,N] and reshaping to [D*N, D*N]), but I have not found directions how to achieve this in KeOps or whether it is even possible. I considered batches, but it appears the different dimensions being correlated rules this out.

I would be glad if you could point me to some related issue or suggest a solution. Some related (g)pytorch code can be found here.

Thank you! Cheers, Arthus

jeanfeydy commented 3 years ago

Hi @arthus701 ,

Thanks for your interest in the library!

As far as I can tell, your computation is a perfect fit for KeOps: the operation you are looking for is .tensorprod(...). Creating a LazyTensor that corresponds to your computation should be straightforward, up to the small caveat that KeOps currently represents (D, D) tensor variables as vectors of size D*D. You should start from input arrays of shape (N, 1, D) and (1, N, D) (and maybe also (1, 1, D*D) to represent the 3x3 identity matrix as an explicit vector of ones and zeros) to create a LazyTensor of shape (N, N, D*D). Depending on what you intend to do with your kernel matrix, operations such as .matvecmult(...) may come in handy.

Going further, if you only need e.g. the sum along the lines of your kernel matrix, you could re-arrange computations to minimize the complexity of the KeOps formula and reach optimal run times. This is detailed in the local_covariance(...) example in the appendices of our NeurIPS 2020 paper, Section D, page 25. This code is very closely related to your multivariate kernel, so you could probably modify it to suit your needs.

Going further, I don't really know what is currently the best way of integrating a custom KeOps kernel within GPyTorch - but @gpleiss and his colleagues will certainly have a good answer for you. What do you think?

Best regards, Jean

arthus701 commented 3 years ago

Hi Jean, thank you for the quick reply! I'm sorry, but I was a bit sloppy when formulating the problem. Actually the inputs are also multidimensional, but different from the output dimensions. I have observations in space and time that correspond to a vector field, so the mapping is [N,4] -> [N,3] and the "vector" dimensions (4 and 3) have no correspondence at all. Is this still possible with your suggested approach? I'm afraid my inputs are already of shape (N, 1, K)/(1, N , K) but with K being location and time and different from D. Looking at the code in the paper, it seems to be possible if I can manage to find a tensorprod-formulation for my case, but I will definitely have to spend some time on the problem to test this.

I managed to integrate a custom single output KeOps-kernel into gpytorch using their examples and with some help from the gpytorch-folks also the pytorch version of the multi-output, so the issue for me is more how to translate the working pytorch implementation to KeOps. Maybe the fact that the multioutput kernel is stored as a vector will cause some problems, I already asked for general advice in the gpytorch-discussions before opening the issue here, but didn't hear back yet.

Thank you so much for your support! You are doing amazing work in answering everyone here!

Cheers, Arthus

Btw. in line 21 of the code block on page 25, there are two x_js appearing, is this correct?

jeanfeydy commented 3 years ago

Hi @arthus701 ,

You're very welcome!

I am not familiar with the standard names of inputs/outputs in GPyTorch, and am not 100% sure to follow what you mean by [N, 4] -> [N, 3], etc. If I understand correctly, would you be happy with an efficient matrix-vector multiplication function that takes as input [M, 4], [N, 4], [N, 3] arrays and returns an [M, 3] array? Note that some of the links in the GPyTorch discussion that you are referring to seems to be broken. If you can re-write your computation here as a math equation, this will allow us to communicate without ambiguities :-)

Best regards, Jean P.S.: Line 21, page 25 is indeed a typo that we introduced when re-formatting the code for the paper... The correct formula is:

 C_ij = (K_ij * x_i).tensorprod(x_j) # (B, N, N, (D+1)*(D+1)
arthus701 commented 3 years ago

Hi Jean, I'm sorry for being unclear. The gpytorch-kernels have a forward method that takes inputs x and y of shape (N, K), (M, K) (ignoring batches) and in the multioutput-case return a tensor/lazytensor of shape (D*N, D*M), at least for pytorch. Looking at the implementation of the KeOps-kernel (e.g. the rbf-kernel), the forward method returns what appears to be a wrapper around your KeOps-LazyTensor. Maybe in the background this corresponds to the matrix-vector multiplication that you are suggesting, but I'm afraid I do not have the insight to answer this confidently.

From a mathematical point of view I want to evaluate a kernel of the form K: R^k x R^k -> R^(dxd) on inputs x = {x_i \in R^k | i = 1, ..., N} and y accordingly. I'm not sure though if this is what you mean by formulating the computation as math equations. Do you want me to give the kernel explicitly?

I figured once the forward method returns the correct tensor, all the matrix-vector- and matrix-matrix-multiplication is inherited from the wrapper and the kernel can be integrated into the gpytorch framework, as is the case for the non-KeOps implementation. But maybe I was wrong and the KeOps-wrapper cannot yet handle the multioutput-case...

Thanks again & Cheers, Arthus

jeanfeydy commented 3 years ago

Hi Arthus,

No problem of course! Indeed, an explicit formula for the kernel would be helpful. As far as KeOps is concerned, an (N, N) array of (D, D) matrices is not exactly the same thing as a (N*D, N*D) array: the former case allows us to let the D coordinates interact with each other at will, which seems to be necessary here. We may have to slightly tweak the KeOpsLazyTensor wrapper to take your use case into account, and an explicit formula would allow me to ensure that I am fully understanding your computation. I am pretty sure that we will make this work, but having an explicit example to think about is always better!

Best regards, Jean

arthus701 commented 3 years ago

Alright, thanks for clarifying. The kernel is rather complicated, unfortunately, but I will do my best to break it down for you. A reference to look at is this paper, see eq. (53) on page 7/3149 and eq. (62) on page 8/3150.

The kernel is related to the generating function of the Legendre polynomials. From inputs x and y, an absolute part a = |x| * |y| / R^2 and a directional part t = dot(x, y) / R^2 are calculated, with R being a lengthscale. Then the scalar kernel F is given by F(a,t) = 1 / sqrt(1 - 2*t + a**2). This form arises when summing Legendre polynomials of all degrees. There are other forms of F that I'm interested in, e.g. only the contribution from l=1 (the dipole), in which case F(a, t) = t/a^3, which is why below I'm keeping F general.

The multioutput kernel is then given by the gradient of this kernel. Calculating the gradients grad_x(a) and grad_x(t) (accordingly for y), the kernel can be expressed as follows:

K(x, y) = [_x*_y^T] * (F_a + a*F_aa) + [_y*_x^T] * (a*F_tt) + [_x*_x^T + _y*_y^T] * (a*F_ta) + [unitmatrix] * (F_t) The vectors _x and _y are unit vectors of the coordinate system in which the vector field is expressed. In my case this also involves some transformations between spherical and Cartesian coordinates. The underscores _a, _aa, etc. denote derivatives.

Let me also give my pytorch implementation and below my crude attempt at a KeOps solution, which may be of help in understanding things. In my attempt at translating this to KeOps, I managed to calculate a and t via LazyTensors and all the F_-terms as well, but as you see only by calculating the transformed stuff in pytorch and then creating LazyTensors from this.

import torch

from gpytorch.kernels import Kernel
from gpytorch.kernels.keops.keops_kernel import KeOpsKernel
from gpytorch.lazy import KeOpsLazyTensor

from pykeops.torch import LazyTensor as KEOLazyTensor

def Car2NEC(clr):
    # this function returns the transformation matrix from cartesian to spherical coordinates,
    # along with the transformed input vector
    t = clr[:, 0].mul(deg2rad)
    p = clr[:, 1].mul(deg2rad)

    ct = torch.cos(t)
    st = torch.sin(t)
    cp = torch.cos(p)
    sp = torch.sin(p)

    size = t.size()

    mat = torch.zeros(size[:-1] + torch.Size([3, 3]) + size[-1:],
                      dtype=clr.dtype,
                      device=clr.device)

    vec = torch.empty(size+torch.Size([3]),
                      dtype=clr.dtype,
                      device=clr.device)

    mat[0, 0] = -ct.mul(cp)
    mat[1, 0] = -ct.mul(sp)
    mat[2, 0] = st

    mat[0, 1] = -sp
    mat[1, 1] = cp

    mat[0, 2] = -st.mul(cp)
    mat[1, 2] = -st.mul(sp)
    mat[2, 2] = -ct

    vec[:, 0] = st*cp
    vec[:, 1] = st*sp
    vec[:, 2] = ct

    return mat.T, vec

class MFKernelField(Kernel):
    def __init__(self, F, R, **kwargs):
        super().__init__(**kwargs)
        self.F = F
        self.R = R

    def num_outputs_per_input(self, x1, x2):
        return 3

    def forward(self, x, y, diag=False, **kwargs):
        n = x.size(-2)
        m = y.size(-2)

        nec_x, _x = Car2NEC(x)
        nec_y, _y = Car2NEC(y)
        a = x[..., :, None, 2].div(self.R) \
            .mul(y[..., None, :, 2].div(self.R))
        t = a*_x.mm(_y.T)

        Fa, Ft, aFaa, aFat, aFtt = self.F.grads(a, t)

        _x_trafo = torch.tensordot(_x, nec_y, dims=([1], [2]))
        _y_trafo = torch.tensordot(nec_x, _y, dims=([2], [1]))

        # 1st term:
        _xyT = torch.zeros((3, 3), dtype=x.dtype, device=x.device)
        _xyT[2, 2] = 1

        res = (Fa + aFaa)[..., :, None, :, None] \
            .mul(_xyT[..., None, :, None, :])
        # 2nd term:
        res += _x_trafo.unsqueeze(-3) \
            .mul(_y_trafo.unsqueeze(-1)) \
            .mul(aFtt[..., :, None, :, None])
        # 3rd term:
        _vT = -_xyT[:, 2]
        res += (_vT[..., None, :, None, None]
                .mul(_x_trafo[..., : ,None, :, :])
                + _y_trafo[..., :, :, :, None]
                .mul(_vT[..., None, None, None, :])) \
            .mul(aFat[:, None, :, None])

        # 4th term:
        res += nec_x[..., :, None, :, :] \
            .matmul(nec_y.transpose(-1, -2).contiguous()) \
            .transpose(-2, -3).contiguous().mul(Ft[..., :, None, :, None])
        return res.reshape(3*n, 3*m)

class MFKernelFieldKeOps(KeOpsKernel):
    def __init__(self, F, R, **kwargs):
        super().__init__(**kwargs)
        self.F = F
        self.R = R

    def num_outputs_per_input(self, x1, x2):
        return 3

    def forward(self, x, y, **kwargs):
        return KeOpsLazyTensor(x, y, self.covar_func)

    def covar_func(self, x, y, **kwargs):
        n = x.size(0)
        m = y.size(0)
        nec_x, _x = Car2NEC(x)
        nec_y, _y = Car2NEC(y)

        __x = KEOLazyTensor(_x[..., :, None, :])
        x_ = KEOLazyTensor(x[..., :, None, :])

        __y = KEOLazyTensor(_y[..., None, :, :])
        y_ = KEOLazyTensor(y[..., None, :, :])

        a = (x_[2] / self.R) * (y_[2] / self.R)
        t = a * (__x * __y).sum(-1)

        Fa, Ft, aFaa, aFat, aFtt = self.F.grads(a, t)
        # I didn't get further than this...
arthus701 commented 3 years ago

Hey @jeanfeydy, I don't mean to be rude, but did you have time to look at the kernel? Is there anything else I can do to clarify things or support you?

Do you maybe have some direction which I could investigate myself to get further? Maybe an example that goes in a similar direction?

Thank you, Arthus

jeanfeydy commented 3 years ago

Hi @arthus701 ,

All my apologies for the late answer! Over the last two weeks, I have had to focus on optimal-transport-related work and did not have the time to look at your problem. Your description is very clear though and I'm getting a bit of free time now: just like #148, I will try to write a good solution on Monday/Tuesday and update the thread accordingly. This is a very nice use case, and I hope that we'll be able to make it work quickly :-)

Best regards, Jean

arthus701 commented 3 years ago

No need to apologize!

Thanks for getting back to the issue, I look forward to reading your solution!

Cheers, Arthus

jeanfeydy commented 3 years ago

Hi @arthus701 ,

I have finally had some time to read your description in detail :-)

I am not sure that I fully understand your code (in the definition of a and t in the PyTorch implementation, I wonder if some underscores are not missing), but what do you think about the code below:

# Input data
x = torch.randn(N, 3)
y = torch.randn(M, 3)

# Norms, divided by the scaling coefficient
nx = (x ** 2).sum(-1).sqrt() / R  # (N,)
ny = (y ** 2).sum(-1).sqrt() / R # (M,)

# Encoding of the vectors as KeOps LazyTensors
x_i = LazyTensor(x.view(N, 1, 3))
y_j = LazyTensor(y.view(1, M, 3))
nx_i = LazyTensor(nx.view(N, 1, 1))
ny_j = LazyTensor(ny.view(1, M, 1))

# Encoding of the reference unit vectors as KeOps LazyTensors
xt_i = LazyTensor(_x.view(N, 1, 3))
yt_j = LazyTensor(_y.view(1, M, 3))

# Encoding of the relevant unit matrices as KeOps LazyTensors
necx_i = LazyTensor(nec_x.view(N, 1, 3*3))
necyT_j = LazyTensor(nec_y.transpose(-1,-2).reshape(N, 1, 3*3))

# Intermediate variables: a and t, as (N, M, 1) LazyTensors
a_ij = nx_i * ny_j  # Product of the norms
t_ij = (x_i | y_j)  # Dot product

# Relevant scalar quantities: we assume that self.F.grads
# is implemented using KeOps-compatible operations (e.g. polynomial formulas)
# and turns a pair of (N, M, 1) LazyTensors into 5 separate (N, M, 1) LazyTensors.
Fa_ij, Ft_ij, aFaa_ij, aFat_ij, aFtt_ij = self.F.grads(a_ij, t_ij)

# K(x, y) = [_x*_y^T] * (F_a + a*F_aa) 
K_ij = (Fa_ij + aFaa_ij) * xt_i.tensorprod(yt_j)  # (N, M, 3*3)

# + [_y*_x^T] * (a*F_tt) 
K_ij = K_ij + aFtt_ij * yt_j.tensorprod(xt_i)  # (N, M, 3*3)

# + [_x*_x^T + _y*_y^T] * (a*F_ta) 
K_ij = K_ij + aFta_ij * (xt_i.tensorprod(xt_i) + yt_j.tensorprod(yt_j))  # (N, M, 3*3)

# + [unitmatrix] * (F_t)
K_ij = K_ij + Ft_ij * necx_i.keops_tensordot(necyT_j, (3, 3), (3, 3), (1,), (0,))  # (N, M, 3*3)

At this stage, K_ij should be a LazyTensor of shape (N, M, 3*3) which is equivalent to a tensor of shape (N*3, M*3). To compute a "matrix-vector product" with a vector b of size (M*3), you could do:

# Encoding as KeOps LazyTensor
b_j = LazyTensor(b.view(1, M, 3))

# (3*3) @ (3,) = (3,) matrix-vector product for every pair of points (i, j) in (N, M)
Kb_ij = K_ij.matvecmult(b_j)  # (N, M, 3)

out_i = Kb_ij.sum(dim=1)  # (N, 3)

What do you think?

Best regards, Jean

arthus701 commented 3 years ago

Hi @jeanfeydy , thank you so much for your response. It looks promising, however there are some issues when trying to integrate it with gpytorch. I think we will manage to sort them out, at least in a way that works, but to do so I have a follow up question:

In order to use the kernel I need not only matrix-vector but also matrix-matrix multiplication. In the KeOps-API I found a tutorial using tensordot, but couldn't figure out how to apply it to this case. Is it possible to define matrix-matrix products similar to the matrix-vector product you suggested in your reply? In gpytorch they use the @-notation wich throws an error, as it is apparently only allowed if the trailing dimension is 1 and not 9, as in this case...

And does KeOps offer a quick (but obviously memory-demanding) way to actually built the kernel matrix? This would be great for testing.

Thanks again for your great work and help, Arthus