Question about KMin and ArgKMin reduction in pykeops

gdurif commented 1 year ago

I've noticed the following behaviors that I am not sure about (and that cause issues with rkeops) regarding KMin and ArgKMin reduction output dimension.

One dimension is added when reducing a Vi variable on i (despite being 1, thus useless)
I do no understanding the behavior when reducing a Vi variable on j (should do nothing I guess but the results contain infinite values)

MWE for KMin:

import numpy as np
from pykeops.numpy import Genred
# Simple formula
formula = "V0"
variables = ["V0=Vi(4)"]
# KMin on Vi
op = Genred(formula, variables, reduction_op="KMin", axis=0, opt_arg=3)
# data
arg = np.random.randn(10, 4)
# Perform reduction (add a dimension)
res = op(arg, backend="CPU")
res.shape
#> (1, 3, 4)
# KMin on Vj
op = Genred(formula, variables, reduction_op="KMin", axis=1, opt_arg=3)
# data
arg = np.random.randn(10, 4)
# Perform reduction (should do nothing?)
res = op(arg, backend="CPU")
res.shape
#> (10, 3, 4)

MWE for Arg_KMin:

import numpy as np
from pykeops.numpy import Genred
# Simple formula
formula = "V0"
variables = ["V0=Vi(4)"]
# ArgKMin on Vi
op = Genred(formula, variables, reduction_op="ArgKMin", axis=0, opt_arg=3)
# data
arg = np.random.randn(10, 4)
# Perform reduction (add a dimension)
res = op(arg, backend="CPU")
res.shape
#> (1, 3, 4)
# ArgKMin on Vj
op = Genred(formula, variables, reduction_op="ArgKMin", axis=1, opt_arg=3)
# data
arg = np.random.randn(10, 4)
# Perform reduction (should do nothing?)
res = op(arg, backend="CPU")
res.shape
#> (10, 3, 4)

joanglaunes commented 1 year ago

Hello @gdurif , This is somewhat normal behavior, even if it is not very intuitive with the particular cases you test. We always assume that we perform a reduction over a symbolic tensor of size $(n_i,n_j,d)$, and the shape of the output shapes for a reduction over Vi is either $(n_j,d)$ for simple reductions like Min or Sum, or $(n_j,k,d)$ if the reduction gives several outputs, as it is the case for the KMin and ArgKMin reductions. And for reductions over Vj, the output shape is $(n_i,k,d)$.

So for the reduction over Vi, in your case you have no Vj variable, so $n_j=1$, and $k=3, d=4$, and so output shape is $(1,3,4)$. This initial 1 dimension is useless, but we do not remove it.
For the reduction over Vj, the result is $(n_i,k,d)=(10,3,4)$, so it is normal again. But here the reduction means you are computing the 3 lowest values over the Vj dimension, and since $n_j=1$, there is only one lowest value... So the values given for the second and third dimensions of the output are infinite, which is kind of consistent... But maybe it would be better if we output an error in the case $k>n_j$. Now, in the case of the ArgKMin reduction, the outputs for second and third dimensions are $0$, and this is not good, we should either fill in with $-1$ values, or output an error.

gdurif commented 1 year ago

Hello @joanglaunes

For other reduction (like sum), the "useless" dimension is not added when not necessary. e.g:

import numpy as np
from pykeops.numpy import Genred
# Simple formula
formula = "V0"
variables = ["V0=Vi(4)"]
# Sum on Vi
op = Genred(formula, variables, reduction_op="Sum", axis=0)
# data
arg = np.random.randn(10, 4)
# Perform reduction (add a dimension)
res = op(arg, backend="CPU")
res.shape
#> (1, 4)
# Sum on Vj
op = Genred(formula, variables, reduction_op="Sum", axis=1)
# data
arg = np.random.randn(10, 4)
# Perform reduction (should do nothing?)
res = op(arg, backend="CPU")
res.shape
#> (10, 4)
sum(res - arg)
#> array([0., 0., 0., 0.])

The behavior is different depending on the reduction.

joanglaunes commented 1 year ago

In fact, yes it is added also, that is why you get shape $(1,4)$ and not just $(4)$. The dimension which is dropped in the case of the Sum reduction is the k dimension, because for a sum there is only one output per reduction. In other words the output has always 2 dimensions for Sum or Min reductions, and always 3 dimensions for KMin or ArgKMin.

gdurif commented 1 year ago

Ok, I see, thanks. I'll see how we can manage this in R.

getkeops / keops

Question about KMin and ArgKMin reduction in pykeops #310