cooper-org / cooper

A general-purpose, deep learning-first library for constrained optimization in PyTorch
https://cooper.readthedocs.io/
MIT License
106 stars 10 forks source link

Do `maximize=True` for dual_optimizers #49

Open juan43ramirez opened 2 years ago

juan43ramirez commented 2 years ago

Enhancement

Pytorch optimizers include a maximize flag (Pytorch Issue)*. When set to True, the sign of gradients is flipped inside optimizer.step() before computing parameter updates. This enables gradient ascent steps natively.

NOTE: This sign flip does not affect the parameter's .grad attribute.

Cooper currently populates the gradients of dual variables with their negative value, so that descent steps performed by the dual optimizer are in fact ascent steps towards optimizing the problem formulation.

https://github.com/cooper-org/cooper/blob/09df7597e9e42deed4c7a162a2bb345868c8bf23/cooper/constrained_optimizer.py#L401-L406

We should not do the sign flipping manually but rather force set maximize=True when instantiating the dual optimizer.

* This has been implemented on on Pytorch's master branch for every optimizer but LBFGS . On v1.12, Adam, SGD and AdaGrad support the flag, but not RMSProp. An assert could be included to ensure that the requested dual optimizer supports the flag.

⚠ This change would break compatibility with versions of Pytorch prior to 1.12.

Motivation

Manually flipping gradients immediately after calculating them (thus ensuring that this happens before calls to dual_optimizer.step()) is error prone. Moreover, keeping track of the fact that gradients have a sign flipped is inconvenient.

By implementing this change we would adopt the official Pytorch approach for performing ascent steps.

Alternatives

The current implementation is functional.

References

juan43ramirez commented 2 years ago

This would require changing the Extra optimizers to also have the flag