Open akashsharma02 opened 2 years ago
Hi @akashsharma02, thanks a lot for your interest and kind words!
My first suggestion was going to be to try implicit or truncated modes, but it looks like you have already tried this. In principle, this should help because most of the graph will be detached, so the compute graph will only retain a few epochs. However, since you already tried this there must be some other issue.
If I understand your explanation correctly, your network parameters are optimization variables for the factor graph, is this correct? If so, that may have a significant memory cost when computing the Jacobian matrix, although two 256-D vectors doesn't sound too large. Even so, have you tried to see if you get similar errors with the CholmodSparseSolver?
Also, would it be possible for you to submit a PR with a small working example that reproduces this error? That would help us understand your use case better and offer better support.
Hi @akashsharma02 following up on this issue. We can close if this is resolved?
Hi @akashsharma02. I was curious if you were still working on this and if you have tried our newer versions that have support for vmap
.
@luisenp Apologies, I haven't been actively working on this for a while now. But I will try your suggestion with a new version of the library, and update the results here.
Thanks for the great library! This looks like a great effort to unify factor graph solvers with autograd and end-to-end parameter learning.
I am using this library, perhaps in a non-traditional case, where I do not run an outer optimization loop to learn parameters. I have a traditional factor graph set up with a "neural" factor, where I have a neural network in the cost function, which produces a loss value. In theory, I should be able to compute the jacobians for such a factor, using the AutoDiffCostFunction, since the network is differentiable. I have also verified the jacobian computation independently to work appropriately.
However, when I try to use such a factor in a simple optimization (theseus_layer.forward()), I consistently get an Out of Memory error, even with an RTX 3090.
Hypothesis: Since the intended setting is to run the Theseus layer as the inner optimization loop, I believe that the optimize call, retains the computation graph through all the iterations of the optimizer for backward passes. This might cause a blow-up in memory quickly. Is there a way to turn this off?
Some details:
I have a PyTorch neural network, which takes in two 256-D vectors as the input and produces an image. I'm trying to obtain optimize/smooth for these vectors using GN or LM optimizers, apart from other SE3 poses (constrained with RelativePoseError Factors).
I have tried different Backward modes, especially truncated with even 1 iteration.
Any help is appreciated!