Closed doyend closed 4 years ago
Can one of the admins verify this patch?
Is the only code modification from:
drv = grad(loss_grads[0], inputs, torch.ones_like(loss_grads[0]) )
to:
drv = grad(loss_grads[0][0][2], inputs)
I'm looking at the function signature of grad:
torch.autograd.grad(outputs, inputs, grad_outputs=None,
retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)
The change is very subtle. What are the components of loss_grads
? It's not very clear what's contained in the loss_grads. So loss_grads[0]
is the sum and loss_grads[0][0][2]
is what then?
It looks good, but would be nice to clarify that pytorch operation.
You can test it out by following code
import torch
from torch.autograd import grad
'''
z = (xy)^2
x = 3, y =2
first order deriv [24 36]
d2z/dx2 = 8
d2z/dxdy = 24
d2z/dy2 = 18
'''
inputs = torch.tensor([3.0,2.0], requires_grad=True)
z = (inputs[0]*inputs[1])**2
first_order_grad = grad(z, inputs, create_graph=True)
second_order_grad_original, = grad(first_order_grad[0], inputs,
torch.ones_like(first_order_grad[0]), retain_graph=True) # Does not give expected answer
second_order_grad_x, = grad(first_order_grad[0][0], inputs, retain_graph=True) #
second_order_grad_y, = grad(first_order_grad[0][1], inputs)
the old code is to calcuate the sum of gradients with respect to the parameters, i.e. (d2z/dx2 + d2z/dxdy, d2z/dy2 + d2z/dxdy)
That's an excellent simplified example.
In [2]: first_order_grad
Out[2]: (tensor([24., 36.], grad_fn=<AddBackward0>),)
In [3]: second_order_grad_original
Out[3]: tensor([32., 42.])
In [4]: second_order_grad_x
Out[4]: tensor([ 8., 24.])
In [5]: second_order_grad_y
Out[5]: tensor([24., 18.])
Now I understand a lot better what's going on. Could you add a comment explaining why you are specifying the access of the arrays using index 2?
# input tensor is 'K, B, S0, sigma, mu, r' zero-based indexed.
inputs = torch.tensor([[110.0, 100.0, 120.0, 0.35, 0.1, 0.05]]).cuda()
inputs.requires_grad = True
x = model(inputs)
# instead of using loss.backward(), use torch.autograd.grad() to compute gradients
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad
loss_grads = grad(x, inputs, create_graph=True)
# loss_grads[0][0][0] is 1st order derivative with respect to K
# loss_grads[0][0][1] is 1st order derivative with respect to B etc.
# S0 index is 2 therefore
# loss_grads[0][0][2] is 1st order derivative with respect to S0
drv = grad(loss_grads[0][0][2], inputs)
# drv[0][0][2] is 2nd order derivative with respect to S0
I don't know of a good succinct comment to explain it, but at least pointing out that S0 corresponds to index 2 of the inputs tensor and that's why you're indexing with 2. Maybe having a separate cell in the notebook demonstrating how torch.autograd.grad
works with the example you gave above z = (xy)^2
.
I added the example into the notebook
The old code computes the sum of second-order differentiation, which is wrong. This one fixed it.