facebookresearch / mbrl-lib

Library for Model Based RL
MIT License
952 stars 154 forks source link

[Bug] Centering, scaling and clamping the population in iCEM #172

Closed marbaga closed 1 year ago

marbaga commented 1 year ago

Steps to reproduce

  1. Run any example configuration using iCEM as action optimizer, e.g. python -m mbrl.examples.main algorithm=mbpo overrides=pets_icem_cartpole

Observed Results

After sampling according to a powerlaw PSD in iCEM, the population is centered on the mean, scaled to the variance and clamped to be within the action space. This process uses the dummy variable population2. However, it appears that the result is not assigned back to the population variable, and it is hence ignored during the rest of the optimization procedure. As a result, I believe that the population is not correctly sampled, and the objective function can be evaluated on actions that potentially do not belong to the action space.

Expected Results

Centering, scaling and clamping should be applied directly to population instead of population2.

Relevant Code

The relevant lines are L438-L441 in mbrl/planning/trajectory_opt.py

https://github.com/facebookresearch/mbrl-lib/blob/f90a29743894fd6db05e73445af0ed83baa845bc/mbrl/planning/trajectory_opt.py#L438-L441

which I believe could be changed to

            population = torch.minimum(
                population * torch.sqrt(var) + mu, self.upper_bound
            )
            population = torch.maximum(population, self.lower_bound)
luisenp commented 1 year ago

Hi @marbaga. Thanks for reporting this. Would you be interested in opening a pull request with the fix?