Run any example configuration using iCEM as action optimizer, e.g. python -m mbrl.examples.main algorithm=mbpo overrides=pets_icem_cartpole
Observed Results
After sampling according to a powerlaw PSD in iCEM, the population is centered on the mean, scaled to the variance and clamped to be within the action space. This process uses the dummy variable population2. However, it appears that the result is not assigned back to the population variable, and it is hence ignored during the rest of the optimization procedure. As a result, I believe that the population is not correctly sampled, and the objective function can be evaluated on actions that potentially do not belong to the action space.
Expected Results
Centering, scaling and clamping should be applied directly to population instead of population2.
Relevant Code
The relevant lines are L438-L441 in mbrl/planning/trajectory_opt.py
Steps to reproduce
python -m mbrl.examples.main algorithm=mbpo overrides=pets_icem_cartpole
Observed Results
After sampling according to a powerlaw PSD in iCEM, the population is centered on the mean, scaled to the variance and clamped to be within the action space. This process uses the dummy variable
population2
. However, it appears that the result is not assigned back to thepopulation
variable, and it is hence ignored during the rest of the optimization procedure. As a result, I believe that the population is not correctly sampled, and the objective function can be evaluated on actions that potentially do not belong to the action space.Expected Results
Centering, scaling and clamping should be applied directly to
population
instead ofpopulation2
.Relevant Code
The relevant lines are L438-L441 in
mbrl/planning/trajectory_opt.py
https://github.com/facebookresearch/mbrl-lib/blob/f90a29743894fd6db05e73445af0ed83baa845bc/mbrl/planning/trajectory_opt.py#L438-L441
which I believe could be changed to