CEM - Githubissues

alberthli commented 6 months ago

Implements the cross entropy method.

Notes:

The variances added are only diagonal.
There is a floor on the allowable standard deviation. In practice, this is the variance that will dictate the variation in the samples, since repeated resampling from a distribution and refitting it to its sample parameters will cause mode collapse due to the tails getting cut off in the finite sample regime.

TODOs:

[x] On non-Allegro hand tasks, the simulations seem like they become unstable easily and it sometimes segfaults. I could be incorrectly managing variables that can change in the GUI.
[x] It's unclear whether I'm computing the improvement correctly in the GUI panel
[x] It's unclear whether the purple trace being plotted is the average rollout over the top n_elites trajectories
[x] Make standard deviations and number of elites GUI safe
[x] Update docs to reference CEM planner

thowell commented 6 months ago

I tried the particle task (easiest one) and the simulation immediately goes unstable. Let's get this task working reliably and then finished reviewing the code.

alberthli commented 6 months ago

I tried the particle task (easiest one) and the simulation immediately goes unstable. Let's get this task working reliably and then finished reviewing the code.

Thanks for checking other tasks. The bug was a result of the initialization of n_elites as num_trajectory / 10. When num_trajectory < 20, n_elites was initialized as 1, so the sample variance was computed as 0 later on in the code. After changing the initial value, most of the time the unstable sim errors go away except under very dynamic scenarios (I checked every single task).

thowell commented 6 months ago

Made some minor changes here https://github.com/alberthli/mujoco_mpc/pull/2. The last things to (potentially) do are:

visualize the elite mean trace
make GUI variables safe
update the docs to mention and reference this planner

thowell commented 6 months ago

Let's merge the PR I submitted to you branch. Otherwise, LGTM.

google-deepmind / mujoco_mpc

CEM #242