Closed parthe closed 1 year ago
Your KeOps code is computing $a{i} = \sum{j=1}^{m} ((x_i - yj)^2 v{j} + w{i}) = (\sum{j=1}^{m} (x_i - yj)^2 v{j}) + m w_i$.
Changing $w$ by $\frac{w}{m}$ would give the same computations, although I'm not sure computing this way would save a lot of CPU cycle, since the addition would happen in the loop anyway.
Hello @parthe ,
As @NightWinkle said, you should replace $w$ by $w/m$ to get the same result. However, your idea of saving an extra O(n) operation by putting the addition in the formula is wrong ; in fact it is the opposite, it adds an extra O(mn) operation because the addition is performed $m$ times instead of 1 for each $i$ index. Now of course, in terms of actual compute time, it may be faster this way or the other, this depends on several factors. But I would be surprised if it makes a big difference.
Now about your second question, you can pass the output tensor to perform in-place computation with the keyword argument out=
. In your code above it means you would write fun(X, Y, v, w, backend="auto", out=a)
if a
is your output tensor.
Another thing : you may save a few milliseconds at each call, in case it matters, by calling Genred
only once, then using the resulting function fun
for each required convolution.
I'm closing the issue since @joanglaunes seems to have answered the main question - but feel free to re-open if needed.
I am trying to write a convolution with an addition, to avoid an extra $O(n)$ operation. I want the output to be $$a{i} = \sum{j=1}^m (x_i - yj)^2 v{j} + w_{i}$$ for $x_i, y_j \in \mathbb{R}^d$, and $a_i, w_i,v_j \in\mathbb{R}^k$, with $i\in{1,2,\ldots,n}$ and $j \in {1,2,\ldots,m}$.
I have 2 issues:
pykeops.torch.Genred
is not clear. Example code below."SqDist(x,y)*v+w"
with"out=SqDist(x,y)*v+w"
, but that gave me an error.