DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.24k stars 1.71k forks source link

env_checker doesn't check if action space is dict #191

Closed wmmc88 closed 4 years ago

wmmc88 commented 4 years ago

Describe the bug An environment with a Dict Action Space, but a non-Dict non-Tuple Observation Space, will fail the _check_nan(env) check because of TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''.

I believe the fix to this is just adding a check that the action space is non-tuple, non-dict here.

Full Error Output:

tests/test_uwrt_arm_env.py:74 (TestClass.test_gym_api_compliance_for_dqn_wrapper_setup)
self = <test_uwrt_arm_env.TestClass object at 0x7f8a5bd2b5d0>

    def test_gym_api_compliance_for_dqn_wrapper_setup(self):
        env = FlattenObservation(MultiDiscreteToContinuousDictActionWrapper(
            gym.make('uwrt-arm-v0', key_position=self.KEY_POSITION, key_orientation=self.KEY_ORIENTATION,
                     max_steps=self.MAX_STEPS, enable_render=True)))

        # Implicitly closes the environment
>       env_checker.check_env(env=env, warn=True, skip_render_check=False)

test_uwrt_arm_env.py:83: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/wmmc88/anaconda3/envs/uwrt_arm_rl/lib/python3.7/site-packages/stable_baselines3/common/env_checker.py:238: in check_env
    _check_nan(env)
/home/wmmc88/anaconda3/envs/uwrt_arm_rl/lib/python3.7/site-packages/stable_baselines3/common/env_checker.py:74: in _check_nan
    _, _, _, _ = vec_env.step(action)
/home/wmmc88/anaconda3/envs/uwrt_arm_rl/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:149: in step
    self.step_async(actions)
/home/wmmc88/anaconda3/envs/uwrt_arm_rl/lib/python3.7/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py:29: in step_async
    self._check_val(async_step=True, actions=actions)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <stable_baselines3.common.vec_env.vec_check_nan.VecCheckNan object at 0x7f8a5bce2510>
async_step = True
kwargs = {'actions': array([OrderedDict([('joint_velocity_commands', array([1, 1, 0, 1, 0]))])],
      dtype=object)}
found = [], name = 'actions'
val = array([OrderedDict([('joint_velocity_commands', array([1, 1, 0, 1, 0]))])],
      dtype=object)

    def _check_val(self, *, async_step, **kwargs):
        # if warn and warn once and have warned once: then stop checking
        if not self.raise_exception and self.warn_once and self._user_warned:
            return

        found = []
        for name, val in kwargs.items():
>           has_nan = np.any(np.isnan(val))
E           TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

/home/wmmc88/anaconda3/envs/uwrt_arm_rl/lib/python3.7/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py:58: TypeError

System Info Describe the characteristic of your environment:

Miffyli commented 4 years ago

Good catch! Seems like bit of a brain-derp in the code right there. A PR to fix this would be welcome! :)

wmmc88 commented 4 years ago

Good catch! Seems like bit of a brain-derp in the code right there. A PR to fix this would be welcome! :)

Will do!

Sukhamjot-Singh commented 1 year ago

Hey guys, I know this is not the right place to ask for help on personal issues but I could not find anything else on the internet.

So I am getting the error TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

My action and observation spaces are both Dicts

self.observation_space = spaces.Dict({
          "theta": spaces.Box(0, 2*math.pi, shape = (M,M)),
          "gk_real": spaces.Box(-np.inf, np.inf, shape = (k, M)), 
          "rk_real": spaces.Box(-np.inf, np.inf, shape = (k, 1)), 
          "v_real": spaces.Box(-np.inf, np.inf, shape = (M,)),
          "gk_imag": spaces.Box(-np.inf, np.inf, shape = (k, M)), 
          "rk_imag": spaces.Box(-np.inf, np.inf, shape = (k, 1)), 
          "v_imag": spaces.Box(-np.inf, np.inf, shape = (M,)), 
      })
      # print(self.observation_space.sample())
      self.action_space = spaces.Dict({
          "theta": spaces.Box(0, 2*math.pi, shape = (M,M)),
          "pk": spaces.Box(-np.inf, np.inf, shape = (k,)),  
      })

I don't understand the issue here. Thanks in advance.

qgallouedec commented 1 year ago

Hi @Sukhamjot-Singh, we need way more information about your problem to be able to help you. Fortunately, we have a "custom env" template to help you describe your issue. Make sure to provide a code as minimal as possible, for example, remove most of your space dict keys if their removal doesn't solve the issue.

Sukhamjot-Singh commented 1 year ago

Hi @Sukhamjot-Singh, we need way more information about your problem to be able to help you. Fortunately, we have a "custom env" template to help you describe your issue. Make sure to provide a code as minimal as possible, for example, remove most of your space dict keys if their removal doesn't solve the issue.

Okay I understand. I will post my problem there. Thank you.