DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.84k stars 1.68k forks source link

Imcompatibility with gym env despite having stable_baselines3 version 2.x #1951

Closed KevinHan1209 closed 3 months ago

KevinHan1209 commented 3 months ago

🐛 Bug

Trying to run the example learn.py from the original gym_pybullet_drone repo and came across a problem with the HoverAviary env. Seems to still be incompatible with gym env even with various versions of 2.x that I tried for stable_baselines3, which is said to support gymnasium. Not sure what's wrong at this point.

Stable_baselines3 versions checked:

Code example


from stable_baselines3.common.env_checker import check_env
from gym_pybullet_drones.envs.HoverAviary import HoverAviary
from gym_pybullet_drones.utils.enums import ObservationType, ActionType

env = HoverAviary(obs=ObservationType('kin'), act=ActionType('one_d_rpm'))
check_env(env)

Relevant log output / Error message

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[1], line 7
      3 from gym_pybullet_drones.utils.enums import ObservationType, ActionType
      6 env = HoverAviary(obs=ObservationType('kin'), act=ActionType('one_d_rpm'))
----> 7 check_env(env)

File ~/opt/anaconda3/envs/drones/lib/python3.10/site-packages/stable_baselines3/common/env_checker.py:461, in check_env(env, warn, skip_render_check)
    458     return
    460 # ============ Check the returned values ===============
--> 461 _check_returned_values(env, observation_space, action_space)
    463 # ==== Check the render method and the declared render modes ====
    464 if not skip_render_check:

File ~/opt/anaconda3/envs/drones/lib/python3.10/site-packages/stable_baselines3/common/env_checker.py:288, in _check_returned_values(env, observation_space, action_space)
    286             raise AssertionError(f"Error while checking key={key}: " + str(e)) from e
    287 else:
--> 288     _check_obs(obs, observation_space, "reset")
    290 # Sample a random action
    291 action = action_space.sample()

File ~/opt/anaconda3/envs/drones/lib/python3.10/site-packages/stable_baselines3/common/env_checker.py:207, in _check_obs(obs, observation_space, method_name)
    200 if isinstance(obs, np.ndarray):
    201     # check obs dimensions, dtype and bounds
    202     assert observation_space.shape == obs.shape, (
...
    213         lower_bounds, upper_bounds = observation_space.low, observation_space.high

AssertionError: The observation returned by the `reset()` method does not match the data type (cannot cast) of the given observation space Box([[-inf -inf   0. -inf -inf -inf -inf -inf -inf -inf -inf -inf  -1.  -1.
   -1.  -1.  -1.  -1.  -1.  -1.  -1.  -1.  -1.  -1.  -1.  -1.  -1.]], [[inf inf inf inf inf inf inf inf inf inf inf inf  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.]], (1, 27), float32). Expected: float32, actual dtype: float64

System Info

Checklist

araffin commented 3 months ago

Hello, this is not a gym issue. if you look at the error message: The observation returned by thereset()method does not match the data type (cannot cast) of the given observation space Expected: float32, actual dtype: float64

Also, in that case, even if the env checker raises an error, it might work when running the algorithm.