[Bug] Centering, scaling and clamping the population in iCEM

Steps to reproduce

Run any example configuration using iCEM as action optimizer, e.g. python -m mbrl.examples.main algorithm=mbpo overrides=pets_icem_cartpole

Observed Results

After sampling according to a powerlaw PSD in iCEM, the population is centered on the mean, scaled to the variance and clamped to be within the action space. This process uses the dummy variable population2. However, it appears that the result is not assigned back to the population variable, and it is hence ignored during the rest of the optimization procedure. As a result, I believe that the population is not correctly sampled, and the objective function can be evaluated on actions that potentially do not belong to the action space.

Expected Results

Centering, scaling and clamping should be applied directly to population instead of population2.

Relevant Code

The relevant lines are L438-L441 in mbrl/planning/trajectory_opt.py

https://github.com/facebookresearch/mbrl-lib/blob/f90a29743894fd6db05e73445af0ed83baa845bc/mbrl/planning/trajectory_opt.py#L438-L441

which I believe could be changed to

            population = torch.minimum(
                population * torch.sqrt(var) + mu, self.upper_bound
            )
            population = torch.maximum(population, self.lower_bound)

facebookresearch / mbrl-lib