Open smao-astro opened 3 years ago
Thanks for you interest in neurodiffeq
. Your question was very good and thought-provoking, I need to check whether it indeed computes a N x N
Jacobian when there's no interdependence in the batch. My intuition is that you are right.
In the meantime, may I ask how you came to know neurodiffeq
? It'd be nice to know how we can further promote it :)
Hi @shuheng-liu ,
Do you have any update would like to share about the questions, I am not familiar to automatic diff so I would like to know whether the current implementation is efficient.
In addition, I noticed that this nightly new API might helpful, what do you think?
I can not remember the detail since I noticed it earlier last year, maybe from some paper (?)
Thanks.
computation (and storage) on the non-diagonal elements is useless and unnecessary
I am pretty sure it won't introduce additional storage because PyTorch accumulates (sums) the gradient w.r.t. the same variable. For example
x = torch.rand(10, 1, requires_grad=True) # shape 10 x 1
y = torch.cat([x, x*2, x*3], 1) # shape 10 x 3
y.backward(torch.ones_like(y))
print(x.grad)
will give you a 10 x 1
tensor filled with 6s, instead of a 10 x 3
tensor of (1, 2, 3)s
As for the computation part, I'm thinking it will be more expensive though because PyTorch uses reverse mode of automatic differentiation, so when the output has a high dimension, the computation will slow down. I'm not sure if this can be fixed without switching to a different framework other than PyTorch.
torch.vmap
looks interesting but I don't have much experience with it. I'm still trying to understand it.
Hi,
I am pretty new to neurodiffeq, thank you very much for the excellent library.
I am interested in the way, and the computational speed, of computing partial derivatives w.r.t. the inputs.
Take forward ODE (1D, 1 unknown variable) solver for example, the input is
x
, a batch of coordinates, and the output of the neural network isy
, the approximated solution of the PDE at these coordinates. If view the neural network as a smooth function that simulate the solution and name itf
, the forward part in the training is evaluatingy = f(x)
, and for each element of input,x_i
, the neural network givesy_i = f(x_i)
, thei
increase from 0 to N-1, the batch size. When constructing the loss function, one evaluate the residual of PDE, which usually require evaluating\frac{\partial y_i}{\partial x_i}
and higher order of derivative.My question related to the way of evaluating the
\frac{\partial y_i}{\partial x_i}
, for examplex
is (N, 1) tensor,y
is also (N, 1) tensor, N is the batch size, if you doautograd.grad(y, t, create_graph=True, grad_outputs=ones, allow_unused=True)
as the lines below https://github.com/odegym/neurodiffeq/blob/718f226d40cfbcb9ed50d72119bd3668b0c68733/neurodiffeq/neurodiffeq.py#L21-L22 my understanding is that it will evaluate a Jacobian Matrix of size (N, N) with elements equal to\frac{\partial y_i}{\partial x_j}
(I, j from 0 to N-1) regardless of the fact thaty_i
only dependent onx_i
and thus computation (and storage) on the non-diagonal elements is useless and unnecessary. In other word, the computation actually can be done by evaluatingN
gradients, but the current method doN * N
times.My question is that:
Thanks!