getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.05k stars 64 forks source link

Invalid shapes during backprop with batch computations #103

Open Bawaw opened 4 years ago

Bawaw commented 4 years ago

Hi KeOps team!

Thanks for the awesome library, I've been using it for a while now and it has made my life a lot easier :).

However, I'm currently having trouble getting the following code to work:

import torch
from pykeops.torch import LazyTensor

def keops_hamiltonian_eqs(x, m):
    sigma = torch.tensor(0.1, device=x.device)
    rows, cols = LazyTensor(x.unsqueeze(2)), LazyTensor(x.unsqueeze(1))
    K = (-((rows - cols)**2).sum(3)/(sigma**2)).exp()

    H = 0.5*torch.sum(m * (K @ m), (1, 2))

    dx_t, dm_t = torch.autograd.grad(H.sum(), (m, x), create_graph=True)
    return dx_t, -dm_t

B,M,D = 4, 1000, 3
x = torch.randn(B, M, D, requires_grad=True)
b = torch.randn(B, M, D, requires_grad=True)

dx, dm = keops_hamiltonian_eqs(x, b)
(1.*dx).sum().backward(retain_graph=True)
(1.*dm).sum().backward(retain_graph=True)

It results in the error "RuntimeError: shape '[4, 1000, 3]' is invalid for input of size 3000" (I'll provide the full stack trace below) which seems like a bug to me? Or has this simply not been implemented yet?

ps: congratulations on finishing your PhD Jean, I found your thesis very insightful!

Kind regards, Balder

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/media/alpha/Projects/regilib/keops_test3.py in <module>
     27 
     28 dx, dm = keops_hamiltonian(x, b)
---> 29 (1.*dx).sum().backward(retain_graph=True)
     30 (1.*dm).sum().backward(retain_graph=True)

~/.conda/envs/gdl/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    183                 products. Defaults to ``False``.
    184         """
--> 185         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    186 
    187     def register_hook(self, hook):

~/.conda/envs/gdl/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
--> 127         allow_unreachable=True)  # allow_unreachable flag
    128 
    129 

RuntimeError: shape '[4, 1000, 3]' is invalid for input of size 3000

Inferior Python exited abnormally with code 1 at Wed Sep 16 16:47:09
jeanfeydy commented 4 years ago

Hi @Bawaw ,

Thanks a lot for your very kind comments! This is clearly a bug, caused by the way KeOps handles batch computations: I have been very busy over the last few days and will try to fix it soon.

Until then, you may be interested by this note on backpropagating through symmetric functions as well as this short LDDMM implementation with PyTorch. Parts of it are a bit obsolete (the toolbox was written before the introduction of the LazyTensor syntax by @joanglaunes or the development of GeomLoss), but it could help you to write quickly an efficient pipeline.

Needless to say, even if we won't be able to meet again in the next few months, I'm always available for a discussion through Skype/Zoom: feel free to send me an e-mail if you have questions related to your project, or just for a chat :-) Best regards, Jean