Closed f-dangel closed 6 months ago
The for-loop over data points in the current implementation can be a bottleneck for small networks. It will be better to parallelize the grad_output computation into a single call to torch.autograd.grad.
grad_output
torch.autograd.grad
The for-loop over data points in the current implementation can be a bottleneck for small networks. It will be better to parallelize the
grad_output
computation into a single call totorch.autograd.grad
.