facebookresearch / mbrl-lib

Library for Model Based RL
MIT License
952 stars 154 forks source link

[Bug] Training with (n, 1) dimensional Box/MultiDiscrete action spaces throwing error #144

Closed JennoMai closed 2 years ago

JennoMai commented 2 years ago

Steps to reproduce

  1. Wrote a gym environment using a MultiDiscrete action space
  2. Copied training code from the PETS example (threw error)
  3. Tried replacing MultiDiscrete action space with a similarly-shaped Box shape (threw the same error)

Observed Results

In the traceback below, I have a length 9 observation space and a length 2 action space; I believe the code might be concatenating the two together, but only a length 1 set of actions is being generated.

Traceback (most recent call last):
  File "train_swarm.py", line 170, in <module>
    env, obs, agent, {}, replay_buffer)
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/util/common.py", line 570, in step_env_and_add_to_buffer
    action = agent.act(obs, **agent_kwargs)
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 650, in act
    trajectory_eval_fn, callback=optimizer_callback
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 526, in optimize
    callback=callback,
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 134, in optimize
    values = obj_fun(population)
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 646, in trajectory_eval_fn
    return self.trajectory_eval_fn(obs, action_sequences)
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 710, in trajectory_eval_fn
    action_sequences, initial_state=initial_state, num_particles=num_particles
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/model_env.py", line 173, in evaluate_action_sequences
    _, rewards, dones, _ = self.step(action_batch, sample=True)
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/model_env.py", line 119, in step
    rng=self._rng,
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/one_dim_tr_model.py", line 289, in sample
    model_in = self._get_model_input_from_tensors(obs, actions)
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/one_dim_tr_model.py", line 128, in _get_model_input_from_tensors
    model_in = self.input_normalizer.normalize(model_in).float()
  File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/util/math.py", line 144, in normalize
    return (val - self.mean) / self.std
RuntimeError: The size of tensor a (10) must match the size of tensor b (11) at non-singleton dimension 1

Expected Results

This runtime error shouldn't be thrown.

Relevant Code

The gym environment I'm using is very messy right now but can be found here, and the corresponding training code is here. However, the code depends heavily on the Botnet simulator, so it may be easier to try to replicate using the MultiAgentEnv here?

luisenp commented 2 years ago

HI @JennoMai, sorry for the delay, I forgot about this issue.

We haven't in fact experimented with multi-dimensional action spaces, but one thing I can say for sure is that the dynamics model used in PETS won't work out of the box for this. Notice this line. This refers to a 1-D model which is hard-coded to assume that both states and actions tensors are one dimensional, and constructs model inputs by concatenating the two; this is the standard setup for proprioceptive control problems for which PETS and MBPO were initially proposed for.

For your particular application, you can probably use the main PETS skeleton, but you would need to replace the model architecture for something more appropriate to your application. You can take a look at our PlaNet implementation for an example of a different kind of model receiving multi-dimensional (visual) state data, but which uses the same planning algorithm as PETS.