Open MaxH1996 opened 3 years ago
This error is happening while solving the adjoint dynamics for your net. The key lines are 47 onwards
xT, λT, μT = sol[-1], grad_output[-1][-1], torch.zeros_like(vf_params)
which are then concatenated and flattened, giving you the tensor of size (1203142)
. Does that match (1203142)
or (1206497)
for your specific network architecture? It also appears to be happening at your init step (see line 39).
Could you share (at a high level) what your f
is?
Thanks for your quick response! It is quite hard to share my f
actually because there is a whole lot going on. But here is the actual class that I call:
class Func(nn.Module):
def __init__(
self,
nuc,
up,
down,
neural_net = Net
):
super().__init__()
self.net = Net(nuc, up, down)
def forward(self, t, x, rn, batch_dim, n_elec):
x = x.reshape(batch_dim, n_elec, 3)
_,_, x = self.net(x, rn)
x = x.reshape(batch_dim*n_elec,3)
return x
Not sure if that helps at all, and then I call NeuralODE
and Func
is called using functools.partial
for the extra arguments. What I did see is that the mismatch is in f0
, but x0
has the correct shape at line 39 init_step
. Correct in the sense that it matches with the variable scale
.
I'd have to check exactly if the flattening and concatenation would match for my architecture, but I think those numbers would make sense.
Btw, if I use the normal odeint
without the adjoint I do not get this problem.
Identifying what the difference 1206497 - 1203142 = 3355
represents in terms of elements is key here. The shape 1206497
is determined during initialization of the adjoint as a concat of
xT, λT, μT = sol[-1], grad_output[-1][-1], torch.zeros_like(vf_params)
whereas 1203142
is produced as the output of f_
here. My guess is that this difference comes from a set of parameters that is registered with vf
(and thus is counted here but is not counted in this line (self.vf_params
):
optimizable_parameters
?partial
without optimizable_parameters
?optimizable
together in self.vf_params
at init help?This is basically the issue I have been trying to work out too (referring to the 3355 difference in parameters). To your points:
Func
, so Func.parameters()
. NeuralODE
then I only get the message Your vector field does not have
nn.Parametersto optimize.
.Another thing I wanted to ask: I use second derivatives in my neural net. Specifically, my self.Net
uses Laplacians. Does this pose a problem for the adjoint method?
Hey, I was wondering if you had any more thoughts on this issue. I didn't have time in the last couple of weeks to work on it, but I am coming back to it now and still experiencing this mismatch in shapes. I checked the areas where you suggested the differences might come from, but they are the same at these two locations.
I'd be happy to take a look at the model if you can share in private. To determine where the issue lies, I would only need access to the nn.Module
that determines your input -> output map.
Hi @Zymrael
I am encountering the same issue. Here is my network, along with the input shape, and how I am creating the NeuralODE:
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(32, 10, kernel_size=3, padding=1)
self.relu = nn.ReLU()
self.maxpool = nn.MaxPool2d(2)
def forward(self, x):
x = self.maxpool(self.relu(self.conv1(x)))
x = self.maxpool(self.relu(self.conv2(x)))
x = self.relu(self.conv3(x))
print('here')
print(x.shape)
return x
model = NeuralODE(SimpleCNN())
#Your vector field callable (nn.Module) should have both time `t` and state `x` as arguments, we've wrapped it for you.
t_span = torch.linspace(0,1,100)
t_eval, trajectory = model(next(iter(train_loader))[0], t_span)
trajectory = trajectory.detach()
next(iter(train_loader))[0].shape
#torch.Size([64, 1, 32, 32])
The error message :
RuntimeError: The size of tensor a (8) must match the size of tensor b (32) at non-singleton dimension 3
Semi-Complete stack trace:
here
torch.Size([64, 10, 8, 8])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-41-5705b2264547>](https://localhost:8080/#) in <cell line: 2>()
1 t_span = torch.linspace(0,1,100)
----> 2 t_eval, trajectory = model(next(iter(train_loader))[0], t_span)
3 trajectory = trajectory.detach()
6 frames
[/usr/local/lib/python3.10/dist-packages/torchdyn/numerics/utils.py](https://localhost:8080/#) in init_step(f, f0, x0, t0, order, atol, rtol)
37 def init_step(f, f0, x0, t0, order, atol, rtol):
38 scale = atol + torch.abs(x0) * rtol
---> 39 d0, d1 = hairer_norm(x0 / scale), hairer_norm(f0 / scale)
40
41 if d0 < 1e-5 or d1 < 1e-5:
RuntimeError: The size of tensor a (8) must match the size of tensor b (32) at non-singleton dimension 3
Hi, I am currently working with the
torchdyn
package and I am getting an error that I cannot really explain:I know this error is specific to my particular code and usage of torchdyn, but mainly I am interested in why this mismatch occurs. The shape of
x0
andf0
that I input are both[8000, 3]
, so I do not understand how I can get a tensor of size(1203142)
or(1206497) .
It appears to happen in the backpropagation step, because just simply passing in values is without any errors.Do you maybe have any idea why this would occur?