computational-imaging / automatic-integration

Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021
179 stars 19 forks source link

Reasons behind writing AutoInt extension for PyTorch #3

Closed RaymondJiangkw closed 2 years ago

RaymondJiangkw commented 2 years ago

Hi, I am really interested in your ideas, and hurry to implement a toy example to test it.

I find that you write a comprehensive and complicated extension in autoint to automatically "extract" the gradient network from integral network and implement colorful methods, especially for the draw() one.

However, I wonder that torch.autograd.functional.jacobian has done exactly the same job for you, if we are only talking about getting the partial derivative of the outputs w.r.t the inputs.

Attempting to answer this question by myself, I write a simple MLP and try to regress a polynomial function by using jacobian or computing derivative manually to "extract" the gradient network. Surprisingly, I find that jacobian can be 60x times slower than manual computation.

So, is speed one of the reasons behind you writing an extension?

davelindell commented 2 years ago

Great question. Yes, it's true that torch.autograd.functional.jacobian can be used to compute the "grad network" and the output will be identical with our implementation. There are two main reasons we created the extension.

  1. We wanted to inspect and understand the architecture of the grad network. As far as I know, it's not possible to explicitly instantiate, view, or reuse the computational graph created by torch.autograd.functional.jacobian.

  2. At the time of implementation, getting the gradient of the input with respect to the output of the network using PyTorch required a full forward pass followed by a call to autograd. Then, during training, this value was used to calculate the loss at each iteration, and then another call to backward() was required to update the network weights. We thought it far more elegant (and efficient) to directly instantiate the grad network for training. Indeed, there are computational and memory savings with this approach, as we detailed in the paper.

RaymondJiangkw commented 2 years ago

Thank you for your quick response! This answer does solve my question.