google-deepmind / mujoco_mpc

Real-time behaviour synthesis with MuJoCo, using Predictive Control
https://github.com/deepmind/mujoco_mpc
Apache License 2.0
913 stars 135 forks source link

CEM #242

Closed alberthli closed 6 months ago

alberthli commented 6 months ago

Implements the cross entropy method.

Notes:

TODOs:

thowell commented 6 months ago

I tried the particle task (easiest one) and the simulation immediately goes unstable. Let's get this task working reliably and then finished reviewing the code.

alberthli commented 6 months ago

I tried the particle task (easiest one) and the simulation immediately goes unstable. Let's get this task working reliably and then finished reviewing the code.

Thanks for checking other tasks. The bug was a result of the initialization of n_elites as num_trajectory / 10. When num_trajectory < 20, n_elites was initialized as 1, so the sample variance was computed as 0 later on in the code. After changing the initial value, most of the time the unstable sim errors go away except under very dynamic scenarios (I checked every single task).

thowell commented 6 months ago

Made some minor changes here https://github.com/alberthli/mujoco_mpc/pull/2. The last things to (potentially) do are:

  1. visualize the elite mean trace
  2. make GUI variables safe
  3. update the docs to mention and reference this planner
thowell commented 6 months ago

Let's merge the PR I submitted to you branch. Otherwise, LGTM.