TL;DR: We might have to use t.detach().clone() instead of t.clone() to remove the computation graph dependence between the new grad and the previous, we might also want to detach the tensor we sum. This makes the accumulation non-differentiable.
I think we are suppose to detach, in torch, the way to be able to differentiate a differentiation seems to be by using torch.autograd.grad with create_graph set to True. The function autograd.backward or tensor.backward will set .grad fields that do not require_grad, they are therefore not in any graph.
might be relevant: https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor
TL;DR: We might have to use
t.detach().clone()
instead oft.clone()
to remove the computation graph dependence between the new grad and the previous, we might also want to detach the tensor we sum. This makes the accumulation non-differentiable.I think we are suppose to detach, in torch, the way to be able to differentiate a differentiation seems to be by using
torch.autograd.grad
withcreate_graph
set toTrue
. The functionautograd.backward
ortensor.backward
will set.grad
fields that do notrequire_grad
, they are therefore not in any graph.