fixed the gamma computation error

doyend commented 4 years ago

The old code computes the sum of second-order differentiation, which is wrong. This one fixed it.

GPUtester commented 4 years ago

Can one of the admins verify this patch?

avolkov1 commented 4 years ago

Is the only code modification from:

drv = grad(loss_grads[0], inputs, torch.ones_like(loss_grads[0]) )

to:

drv = grad(loss_grads[0][0][2], inputs)

I'm looking at the function signature of grad:

torch.autograd.grad(outputs, inputs, grad_outputs=None,
    retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)

The change is very subtle. What are the components of loss_grads? It's not very clear what's contained in the loss_grads. So loss_grads[0] is the sum and loss_grads[0][0][2] is what then?

It looks good, but would be nice to clarify that pytorch operation.

yidong72 commented 4 years ago

You can test it out by following code

import torch
from torch.autograd import grad
'''
z = (xy)^2
x = 3, y =2

first order deriv [24 36]
d2z/dx2 = 8
d2z/dxdy = 24
d2z/dy2 = 18
'''

inputs = torch.tensor([3.0,2.0], requires_grad=True)

z = (inputs[0]*inputs[1])**2
first_order_grad = grad(z, inputs, create_graph=True)

second_order_grad_original, = grad(first_order_grad[0], inputs,
                                   torch.ones_like(first_order_grad[0]), retain_graph=True) # Does not give expected answer
second_order_grad_x, = grad(first_order_grad[0][0], inputs, retain_graph=True) #
second_order_grad_y, = grad(first_order_grad[0][1], inputs)

the old code is to calcuate the sum of gradients with respect to the parameters, i.e. (d2z/dx2 + d2z/dxdy, d2z/dy2 + d2z/dxdy)

avolkov1 commented 4 years ago

That's an excellent simplified example.

In [2]: first_order_grad                                                                                                                                                                                           
Out[2]: (tensor([24., 36.], grad_fn=<AddBackward0>),)
In [3]: second_order_grad_original                                                                                                                                                                                 
Out[3]: tensor([32., 42.])
In [4]: second_order_grad_x                                                                                                                                                                                        
Out[4]: tensor([ 8., 24.])
In [5]: second_order_grad_y                                                                                                                                                                                        
Out[5]: tensor([24., 18.])

Now I understand a lot better what's going on. Could you add a comment explaining why you are specifying the access of the arrays using index 2?

# input tensor is 'K, B, S0, sigma, mu, r' zero-based indexed.
inputs = torch.tensor([[110.0, 100.0, 120.0, 0.35, 0.1, 0.05]]).cuda()
inputs.requires_grad = True
x = model(inputs)

# instead of using loss.backward(), use torch.autograd.grad() to compute gradients
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad
loss_grads = grad(x, inputs, create_graph=True)
# loss_grads[0][0][0] is 1st order derivative with respect to K
# loss_grads[0][0][1] is 1st order derivative with respect to B etc.

# S0 index is 2 therefore
# loss_grads[0][0][2] is 1st order derivative with respect to S0
drv = grad(loss_grads[0][0][2], inputs)
# drv[0][0][2] is 2nd order derivative with respect to S0

I don't know of a good succinct comment to explain it, but at least pointing out that S0 corresponds to index 2 of the inputs tensor and that's why you're indexing with 2. Maybe having a separate cell in the notebook demonstrating how torch.autograd.grad works with the example you gave above z = (xy)^2.

yidong72 commented 4 years ago

I added the example into the notebook

NVIDIA / fsi-samples

fixed the gamma computation error #79