araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 206 forks source link

HER+SAC on Robotic Environment #77

Closed peiseng closed 4 years ago

peiseng commented 4 years ago

Hi I am getting this error while running this code:

python train.py --algo her --env FetchPickAndPlace-v1 --tensorboard-log "C:\Users\pei-seng.tan\Desktop\Deep_RL\rl-baselines-zoo\USM_RL\SAC+HER" --eval-freq 10000 --eval-episodes 10 --save-freq 100000

It happens when the time-steps=10000

image

May I know how do i solve it?

araffin commented 4 years ago

Hello, Please fill the issue template completely.

araffin commented 4 years ago

You forgot to give the complete traceback... it is due to the eval env which is not properly wrapped. See the warning:

stable_baselines/common/callbacks.py:277: UserWarning: Training and eval env are not of the same type<stable_baselines.her.utils.HERGoalEnvWrapper object at 0x7f68c48c09e8> != <stable_baselines.common.vec_env.dummy_vec_env.DummyVecEnv object at 0x7f68b00f7908>
  "{} != {}".format(self.training_env, self.eval_env))

I would appreciate a PR that solves this issue.

PierreExeter commented 4 years ago

I'm having the same issue with the parking-v0 environment.

python train.py --algo her --env parking-v0 -n 10000

It returns the same error:

../stable-baselines/stable_baselines/common/callbacks.py:280: UserWarning: Training and eval env are not of the same type<stable_baselines.her.utils.HERGoalEnvWrapper object at 0x7f68828e7be0> != <stable_baselines.common.vec_env.dummy_vec_env.DummyVecEnv object at 0x7f686266ea58>
  "{} != {}".format(self.training_env, self.eval_env))

Traceback (most recent call last):
  File "train.py", line 411, in <module>
    model.learn(n_timesteps, **kwargs)
  File "../stable-baselines/stable_baselines/her/her.py", line 113, in learn
    replay_wrapper=self.replay_wrapper)
  File "../stable-baselines/stable_baselines/sac/sac.py", line 416, in learn
    if callback.on_step() is False:
  File "../stable-baselines/stable_baselines/common/callbacks.py", line 90, in on_step
    return self._on_step()
  File "../stable-baselines/stable_baselines/common/callbacks.py", line 166, in _on_step
    continue_training = callback.on_step() and continue_training
  File "../stable-baselines/stable_baselines/common/callbacks.py", line 90, in on_step
    return self._on_step()
  File "../stable-baselines/stable_baselines/common/callbacks.py", line 298, in _on_step
    return_episode_rewards=True)
  File "../stable-baselines/stable_baselines/common/evaluation.py", line 38, in evaluate_policy
    action, state = model.predict(obs, state=state, deterministic=deterministic)
  File "../stable-baselines/stable_baselines/sac/sac.py", line 527, in predict
    vectorized_env = self._is_vectorized_observation(observation, self.observation_space)
  File "../stable-baselines/stable_baselines/common/base_class.py", line 723, in _is_vectorized_observation
    .format(", ".join(map(str, observation_space.shape))))
ValueError: Error: Unexpected observation shape () for Box environment, please use (18,) or (n_env, 18) for the observation shape.

Wrapping the eval_env with HERGoalEnvWrapper in callback.py gets rid of the warning

import stable_baselines
...
self.eval_env = stable_baselines.her.HERGoalEnvWrapper(self.eval_env)

But then all sorts of issues about the observation dimension come up so I'm not sure this is the right way to go about it. For example after this modification, it's now necessary to flatten the observation (list of list) in ddpg.py (or any other algorithm used with HER).

observation = np.array([item for sublist in observation for item in sublist])

Or I also need to flatten obs_dict in utils.py

if len(obs_dict[KEY_ORDER[0]].shape) == 2:
    for key in KEY_ORDER:
        obs_dict[key] = obs_dict[key][0]

I haven't solved this but I'm just commenting what I tried in case this is of any help.

araffin commented 4 years ago

For example after this modification, it's now necessary to flatten the observation

There is a method for that in the HERGoalWrapper normally. I don't have the time this week to have a deeper look into it. For now you can deactivate the evaluation passing --eval-freq -1 unless you solve the issue, in that case, I would appreciate a PR ;)

PierreExeter commented 4 years ago

Ok I'll give it a try. But I will need to create a PR for Stable Baselines as well since I'm changing HERGoalWrapper