MIT-REALM / neural_clbf

Toolkit for learning controllers based on robust control Lyapunov barrier functions
BSD 3-Clause "New" or "Revised" License
134 stars 44 forks source link

Using simulation to replace analytic dynamics #17

Closed dtch1997 closed 1 year ago

dtch1997 commented 1 year ago

Hi, thanks for releasing this great work!

I am a researcher in robotics, and I am interested in applying learning of contraction metrics to certain settings where dynamics are not known or difficult to model analytically (and are instead found through e.g. simulation.) However, I noticed that the method for learning contraction metrics depends on matrices A = J_x(f), B = J_u(f) respectively.

To get around this restriction, one option is to approximate A, B with numerical estimates of the dynamics, e.g. by perturbing each dimension of x, u a small amount, running the pertubed inputs through simulation, and using the finite differences to construct the Jacobian.

Do you foresee that this will break the theoretical guarantees established for the contraction metric learning or cause any other issues for the learning process? Thanks very much!

dawsonc commented 1 year ago

Hi! Sounds like a cool project! I think we might be limited in terms of how much I can modify this codebase to support your use-case, but I'm happy to discuss how you might go about implementing this.

Thinking out loud on this:

To evaluate the contraction metric condition (and thus train the contraction metric + policy networks using our framework), we need the Jacobian of the dynamics w.r.t. both x (i.e. A) and u (i.e. B). The current code is implemented with the assumption that all of your dynamics are control-affine, but that is only necessary to solve the quadratic programs that arise in CLF/CBF/CLBF settings, and are not needed afaik for contraction metrics. If you have a simulator for your dynamics (i.e. f(x, u)), then you don't need analytical dynamics to get the Jacobians; all you need is to be able to auto-diff your dynamics function (using PyTorch, JAX, etc.), or you could use finite-differences (as you suggest). Another option is to ignore the negative-semidefinite contraction metric condition and instead just define a loss based on how much you observe the metric $(x^ - x)^T M (x^ - x)$ decrease over time along trajectories of your simulated system.

tl;dr

  1. If you can implement your simulator using an automatically-differentiable framework (PyTorch, JAX, etc.), then just do that and define the contraction condition in the standard way for A and B
  2. If you can't do that and still want to take the self-supervised "certificate learning" approach, then you can use finite differences as you suggest.
  3. If you are fine taking a less-mathematically-justified but possibly-empirically-successful strategy, you could approximate the "contraction loss" by simulating a bundle of trajectories that all start out close to each other and observing how the metric between the trajectories changes over time. Each pair of trajectories gives you a metric value at each timestep, so you can numerically approximate $d/dt [(x_1 - x_2)^T M (x_1 - x_2)]$ and make a loss to encourage $d/dt (x_1 - x_2)^T M (x_1 - x_2) <= -2 lambda (x_1 - x_2)^T M (x_1 - x_2)$, averaged across all trajectories in the bundle. You could then minimize that using policy gradient (RL) or gradient descent (if you have an auto-diff-able simulator, but in that case it would be interesting to compare how effective this would be vs. the approach in 2.)

Hope that helps! I'm closing this issue, since this seems like more of a new research project and not something I want to add as a new feature to this existing project, but feel free to continue the discussion in this thread. Maybe there are also some published references out there on "Black-box contraction metric learning" or something similar, since that's what this sounds like.

dtch1997 commented 1 year ago

Thanks so much for the detailed response! I think I will go with option 2 for now, but potentially all the options you suggested sound promising, and would be interesting to implement and compare. Keen to revisit those at some point

dawsonc commented 1 year ago

Sounds good; good luck!