facebookresearch / theseus

A library for differentiable nonlinear optimization
MIT License
1.72k stars 121 forks source link

Jacobian computation for AutoDiffCostFunction #125

Open akashsharma02 opened 2 years ago

akashsharma02 commented 2 years ago

Thanks for the great library! This looks like a great effort to unify factor graph solvers with autograd and end-to-end parameter learning.

I am using this library, perhaps in a non-traditional case, where I do not run an outer optimization loop to learn parameters. I have a traditional factor graph set up with a "neural" factor, where I have a neural network in the cost function, which produces a loss value. In theory, I should be able to compute the jacobians for such a factor, using the AutoDiffCostFunction, since the network is differentiable. I have also verified the jacobian computation independently to work appropriately.

However, when I try to use such a factor in a simple optimization (theseus_layer.forward()), I consistently get an Out of Memory error, even with an RTX 3090.

Hypothesis: Since the intended setting is to run the Theseus layer as the inner optimization loop, I believe that the optimize call, retains the computation graph through all the iterations of the optimizer for backward passes. This might cause a blow-up in memory quickly. Is there a way to turn this off?

Some details:

Any help is appreciated!

luisenp commented 2 years ago

Hi @akashsharma02, thanks a lot for your interest and kind words!

My first suggestion was going to be to try implicit or truncated modes, but it looks like you have already tried this. In principle, this should help because most of the graph will be detached, so the compute graph will only retain a few epochs. However, since you already tried this there must be some other issue.

If I understand your explanation correctly, your network parameters are optimization variables for the factor graph, is this correct? If so, that may have a significant memory cost when computing the Jacobian matrix, although two 256-D vectors doesn't sound too large. Even so, have you tried to see if you get similar errors with the CholmodSparseSolver?

Also, would it be possible for you to submit a PR with a small working example that reproduces this error? That would help us understand your use case better and offer better support.

mhmukadam commented 2 years ago

Hi @akashsharma02 following up on this issue. We can close if this is resolved?

luisenp commented 1 year ago

Hi @akashsharma02. I was curious if you were still working on this and if you have tried our newer versions that have support for vmap.

akashsharma02 commented 1 year ago

@luisenp Apologies, I haven't been actively working on this for a while now. But I will try your suggestion with a new version of the library, and update the results here.