NVIDIA / fsi-samples

A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
271 stars 115 forks source link

fixed the gamma computation error #79

Closed doyend closed 4 years ago

doyend commented 4 years ago

The old code computes the sum of second-order differentiation, which is wrong. This one fixed it.

GPUtester commented 4 years ago

Can one of the admins verify this patch?

avolkov1 commented 4 years ago

Is the only code modification from:

drv = grad(loss_grads[0], inputs, torch.ones_like(loss_grads[0]) )

to:

drv = grad(loss_grads[0][0][2], inputs)

I'm looking at the function signature of grad:

torch.autograd.grad(outputs, inputs, grad_outputs=None,
    retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)

The change is very subtle. What are the components of loss_grads? It's not very clear what's contained in the loss_grads. So loss_grads[0] is the sum and loss_grads[0][0][2] is what then?

It looks good, but would be nice to clarify that pytorch operation.

yidong72 commented 4 years ago

You can test it out by following code

import torch
from torch.autograd import grad
'''
z = (xy)^2
x = 3, y =2

first order deriv [24 36]
d2z/dx2 = 8
d2z/dxdy = 24
d2z/dy2 = 18
'''

inputs = torch.tensor([3.0,2.0], requires_grad=True)

z = (inputs[0]*inputs[1])**2
first_order_grad = grad(z, inputs, create_graph=True)

second_order_grad_original, = grad(first_order_grad[0], inputs,
                                   torch.ones_like(first_order_grad[0]), retain_graph=True) # Does not give expected answer
second_order_grad_x, = grad(first_order_grad[0][0], inputs, retain_graph=True) #
second_order_grad_y, = grad(first_order_grad[0][1], inputs)

the old code is to calcuate the sum of gradients with respect to the parameters, i.e. (d2z/dx2 + d2z/dxdy, d2z/dy2 + d2z/dxdy)

avolkov1 commented 4 years ago

That's an excellent simplified example.

In [2]: first_order_grad                                                                                                                                                                                           
Out[2]: (tensor([24., 36.], grad_fn=<AddBackward0>),)
In [3]: second_order_grad_original                                                                                                                                                                                 
Out[3]: tensor([32., 42.])
In [4]: second_order_grad_x                                                                                                                                                                                        
Out[4]: tensor([ 8., 24.])
In [5]: second_order_grad_y                                                                                                                                                                                        
Out[5]: tensor([24., 18.])

Now I understand a lot better what's going on. Could you add a comment explaining why you are specifying the access of the arrays using index 2?

# input tensor is 'K, B, S0, sigma, mu, r' zero-based indexed.
inputs = torch.tensor([[110.0, 100.0, 120.0, 0.35, 0.1, 0.05]]).cuda()
inputs.requires_grad = True
x = model(inputs)

# instead of using loss.backward(), use torch.autograd.grad() to compute gradients
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad
loss_grads = grad(x, inputs, create_graph=True)
# loss_grads[0][0][0] is 1st order derivative with respect to K
# loss_grads[0][0][1] is 1st order derivative with respect to B etc.

# S0 index is 2 therefore
# loss_grads[0][0][2] is 1st order derivative with respect to S0
drv = grad(loss_grads[0][0][2], inputs)
# drv[0][0][2] is 2nd order derivative with respect to S0

I don't know of a good succinct comment to explain it, but at least pointing out that S0 corresponds to index 2 of the inputs tensor and that's why you're indexing with 2. Maybe having a separate cell in the notebook demonstrating how torch.autograd.grad works with the example you gave above z = (xy)^2.

yidong72 commented 4 years ago

I added the example into the notebook