getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 65 forks source link

GenredAutogradBackward returned an incorrect number of gradients (expected 14, got 13) #327

Closed byphilipp closed 5 months ago

byphilipp commented 1 year ago

The back propagation is not working

I'm try versions 2.1.1 + torch 1.13.1 and 2.1.2 + torch 2.0.1 with same results

from pykeops.torch import Vi, Vj, LazyTensor
import torch
xc = torch.randn(256,5)
xc.requires_grad_(True)

x_i = Vi (xc)
x_j = Vj (xc)
d = -LazyTensor.sqdist(x_i,x_j)
fun = d.logsumexp_reduction(dim=1,call=False)

fun(xc).sum().backward()

[KeOps] Generating code for formula Sum_Reduction(-((2(Extract(Var(3,2,0),1,1)Exp(-((Var(0,5,0)-Var(1,5,1))|(Var(0,5,0)-Var(1,5,1)))-Extract(Var(4,2,0),0,1))))*-(Var(0,5,0)-Var(1,5,1))),1) ... OK

RuntimeError Traceback (most recent call last) Cell In[178], line 11 8 d = -LazyTensor.sqdist(x_i,x_j) 9 fun = d.logsumexp_reduction(dim=1,call=False) ---> 11 fun(xc).sum().backward()

File /miniconda/lib/python3.9/site-packages/torch/_tensor.py:488, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs) 478 if has_torch_function_unary(self): 479 return handle_torch_function( 480 Tensor.backward, 481 (self,), (...) 486 inputs=inputs, 487 ) --> 488 torch.autograd.backward( 489 self, gradient, retain_graph, create_graph, inputs=inputs 490 )

File /miniconda/lib/python3.9/site-packages/torch/autograd/init.py:197, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 192 retain_graph = create_graph 194 # The reason we repeat same the comment below is that 195 # some Python versions print out the first line of a multi-line function 196 # calls in the traceback and some print out the last line --> 197 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 198 tensors, gradtensors, retain_graph, create_graph, inputs, 199 allow_unreachable=True, accumulate_grad=True)

RuntimeError: function GenredAutogradBackward returned an incorrect number of gradients (expected 14, got 13)

JacksonFurrier commented 6 months ago

Running your example gives me the same issue. However, the following works: " from pykeops.torch import Vi, Vj, LazyTensor import torch xc = torch.randn(256,5) xc.requiresgrad(True)

x_i = Vi (xc) x_j = Vj (xc) d = -LazyTensor.sqdist(x_i,x_j) fun = d.logsumexp_reduction(axis=1)

grad(fun.sum(), (xc)) "

My best guess without taking a look at the implementation used in KeOps is that the "aliased" variable is involved in the computation of the gradient too, so Vi, Vj too, which results in "one more" gradient ... This is a wild guess!

joanglaunes commented 6 months ago

Hello @byphilipp and @JacksonFurrier , There is no bug here. In fact the code in the initial post by @byphilipp is not valid : fun is a function which must be called with no argument, since x_i and x_j are initialized from an actual tensor. So the valid code is

from pykeops.torch import Vi, Vj, LazyTensor
import torch
xc = torch.randn(256,5)
xc.requires_grad_(True)

x_i = Vi (xc)
x_j = Vj (xc)
d = -LazyTensor.sqdist(x_i,x_j)
fun = d.logsumexp_reduction(dim=1,call=False)

fun().sum().backward()

Alternatively, you can build the LazyTensor without any reference to actual tensors and then pass the tensors as arguments when calling, as follows:

from pykeops.torch import Vi, Vj, LazyTensor
import torch
xc = torch.randn(256,5)
xc.requires_grad_(True)

x_i = Vi (0,5)    # tensor for x_i will be given as first argument in the call, and dimension is 5
x_j = Vj (1,5)    # tensor for x_j will be given as second argument in the call, and dimension is 5
d = -LazyTensor.sqdist(x_i,x_j)
fun = d.logsumexp_reduction(dim=1)     # here call=False is implicit, since no computation can be done yet.

fun(xc,xc).sum().backward()
joanglaunes commented 5 months ago

I have added a check for the number of arguments when calling a reduction, so that the error message becomes clearer when a KeOps reduction is called with an incorrect number of input tensors.