HJReachability / ilqgames

Iterative Linear-Quadratic Games!
https://hjreachability.github.io/ilqgames/
BSD 3-Clause "New" or "Revised" License
132 stars 41 forks source link

Adaptive regularization in feedback LQ #65

Closed dfridovi closed 2 years ago

dfridovi commented 2 years ago

In the feedback LQ Nash solver, this adds an amount to each diagonal element to the big S matrix in the coupled Ricatti equation which is just enough to ensure that all eigenvalues are nonnegative. This amount is computed according to the Gershgorin Circle Theorem: https://en.wikipedia.org/wiki/Gershgorin_circle_theorem

@lassepe this is what I had in mind. Does this match what you were thinking?

lassepe commented 2 years ago

Nice! This seems like a very clean way to handle regularization. Only comment I have is that you may want to add some additional small value to it so as to ensure invertability for the rare case in which the bound may hold with equality.

jfisac commented 2 years ago

Is this regularization always being applied, though? I don’t believe this is a tight bound, so my worry would be that the resulting player policies could be significantly different from the unregularized solution (even when S already has strictly positive eigenvalues). Is there a reason that this wouldn’t be a concern?

On Nov 8, 2021, at 15:56, David Fridovich-Keil @.***> wrote:  Merged #65 into master.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

dfridovi commented 2 years ago

You're right - this is not a tight bound and it is possible that this would "over regularize" the problem. That said, the nice thing about it is that it's really fast to compute since it doesn't require actually finding eigenvalues.

As to why it shouldn't affect convergence, think about Levenberg-Marquardt methods. Essentially what happens there is that you add a scaled identity to S where the scaling is changing over time but is always positive (iirc). The idea is that, if the regularization is unnecessary (all e-vals were already positive), then over regularization yields something closer to a steepest descent method (which will still converge, albeit slower). At least that's my intuition.