TorchJD / torchjd

Library for Jacobian descent with PyTorch. It enables optimization of neural networks with multiple losses (e.g. multi-task learning).
https://torchjd.org
MIT License
151 stars 0 forks source link

Turn `Store` into `Accumulate` #117

Closed PierreQuinton closed 2 months ago

PierreQuinton commented 2 months ago

might be relevant: https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor

TL;DR: We might have to use t.detach().clone() instead of t.clone() to remove the computation graph dependence between the new grad and the previous, we might also want to detach the tensor we sum. This makes the accumulation non-differentiable.

I think we are suppose to detach, in torch, the way to be able to differentiate a differentiation seems to be by using torch.autograd.grad with create_graph set to True. The function autograd.backward or tensor.backward will set .grad fields that do not require_grad, they are therefore not in any graph.