getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

Difficulty writing a convolution (matrix-vector multiplication) #278

Closed parthe closed 1 year ago

parthe commented 1 year ago

I am trying to write a convolution with an addition, to avoid an extra $O(n)$ operation. I want the output to be $$a{i} = \sum{j=1}^m (x_i - yj)^2 v{j} + w_{i}$$ for $x_i, y_j \in \mathbb{R}^d$, and $a_i, w_i,v_j \in\mathbb{R}^k$, with $i\in{1,2,\ldots,n}$ and $j \in {1,2,\ldots,m}$.

I have 2 issues:

  1. The extra addition with $w_{i}$ seems doable but the documentation for pykeops.torch.Genred is not clear. Example code below.
  2. How can I pass output as an argument? My intuition was replacing "SqDist(x,y)*v+w" with "out=SqDist(x,y)*v+w", but that gave me an error.
from pykeops.torch import Genred
import torch

K = lambda x, y : (x[:,None,:] - y[None,:,:]).pow(2).sum(-1)

def explicit_kmv(x, y, v, w):
    return  K(X, Y) @ v + w

def implicit_kmv(X, Y, v, w):
    d, k = X.shape[-1], v.shape[-1]
    fun = Genred(
        "SqDist(x,y)*v+w",  # Formula 
        [f"x = Vi(0,{d})",  f"y = Vj(1,{d})",  f"v = Vj(2,{k})",  f"w = Vi(3,{k})"], 
        reduction_op='Sum', 
        axis=1, 
        dtype='float32'
    )
    return fun(X, Y, v, w, backend="auto")

if __name__ == "__main__":
    n, m, d, k = 2, 3, 4, 5
    x0, y0, v0, w0 = 1., 2., 3., 4.
    dev = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    X = torch.ones(n, d, device=dev)*x0
    Y = torch.ones(m, d, device=dev)*y0
    v = torch.ones(m, k, device=dev)*v0
    w = torch.ones(n, k, device=dev)*w0
    a1 = explicit_kmv(X, Y, v, w)
    a2 = implicit_kmv(X, Y, v, w)
    print('X:\n', X)
    print('Y:\n', Y)
    print('K:\n', K(X,Y))
    print('v:\n', v)
    print('w:\n', w)
    print('a1 - a2: ', torch.unique(a2-a1).item(), f'* torch.ones({n}, {k})')
    print('w0(m-1): ', w0*(m-1))
    ### OUTPUT
    # a1 - a2:  8.0 * torch.ones(2, 5)
    # w0(m-1):  8.0
NightWinkle commented 1 year ago

Your KeOps code is computing $a{i} = \sum{j=1}^{m} ((x_i - yj)^2 v{j} + w{i}) = (\sum{j=1}^{m} (x_i - yj)^2 v{j}) + m w_i$.

Changing $w$ by $\frac{w}{m}$ would give the same computations, although I'm not sure computing this way would save a lot of CPU cycle, since the addition would happen in the loop anyway.

joanglaunes commented 1 year ago

Hello @parthe , As @NightWinkle said, you should replace $w$ by $w/m$ to get the same result. However, your idea of saving an extra O(n) operation by putting the addition in the formula is wrong ; in fact it is the opposite, it adds an extra O(mn) operation because the addition is performed $m$ times instead of 1 for each $i$ index. Now of course, in terms of actual compute time, it may be faster this way or the other, this depends on several factors. But I would be surprised if it makes a big difference. Now about your second question, you can pass the output tensor to perform in-place computation with the keyword argument out= . In your code above it means you would write fun(X, Y, v, w, backend="auto", out=a) if a is your output tensor. Another thing : you may save a few milliseconds at each call, in case it matters, by calling Genred only once, then using the resulting function fun for each required convolution.

jeanfeydy commented 1 year ago

I'm closing the issue since @joanglaunes seems to have answered the main question - but feel free to re-open if needed.