DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.72k stars 1.66k forks source link

observation_space does not match reset() observation and The environment is being initialised with render_mode='human' that is not in the possible render_modes ([]) #1992

Closed XiaobenLi00 closed 3 weeks ago

XiaobenLi00 commented 3 weeks ago

🐛 Bug

I use sb3 for myochallenge, and the env is the myoChallengeBimanual-v0.

When I try to use vecEnv (both SubprocVecEnv and DummyVecEnv), I got warnings like this

WARN: The environment is being initialised with render_mode='human' that is not in the possible render_modes ([]).
Warning: Unused kwargs found: {'render_mode': 'human'}
WARN: The obs returned by the `reset()` method was expecting numpy array dtype to be float32, actual type: float64
WARN: The obs returned by the `reset()` method is not within the observation space.

When I use check_env, I also got simliar warnings

Error executing job with overrides: ['env=myoChallengeBimanual-v0', 'job_name=checkpoint.pt']
Traceback (most recent call last):
  File "hydra_sb3_launcher.py", line 36, in configure_jobs
    train_loop(job_data)
  File "/home/lixiaoben/projects/myosuite/myosuite/agents/sb3_job_script.py", line 37, in train_loop
    check_env(env)
  File "/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py", line 473, in check_env
    _check_returned_values(env, observation_space, action_space)
  File "/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py", line 300, in _check_returned_values
    _check_obs(obs, observation_space, "reset")
  File "/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py", line 219, in _check_obs
    assert np.can_cast(obs.dtype, observation_space.dtype), (
AssertionError: The observation returned by the `reset()` method does not match the data type (cannot cast) of the given observation space Box(-10.0, 10.0, (211,), float32). Expected: float32, actual dtype: float64

In summary, I have two problems

  1. render_mode
  2. reset() method

I did check previous issues, there are similiar problems (https://github.com/DLR-RM/stable-baselines3/issues/921#issue-1249240466, https://github.com/DLR-RM/stable-baselines3/issues/1968#issuecomment-2238440835) but not seems to be solved, so I open this issue

Code example

from https://github.com/MyoHub/myosuite/blob/main/myosuite/agents/sb3_job_script.py

import os
import json
import time as timer
from stable_baselines3 import PPO, SAC
from stable_baselines3.common.callbacks import CheckpointCallback
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.logger import configure
from stable_baselines3.common.vec_env import VecNormalize
import torch
from omegaconf import OmegaConf

import functools
from in_callbacks import InfoCallback, FallbackCheckpoint, SaveSuccesses, EvalCallback

IS_WnB_enabled = False
try:
    import wandb
    from wandb.integration.sb3 import WandbCallback
    IS_WnB_enabled = True
except ImportError as e:
    pass 

def train_loop(job_data) -> None:

    config = {
            "policy_type": job_data.policy,
            "total_timesteps": job_data.total_timesteps,
            "env_name": job_data.env,
    }
    if IS_WnB_enabled:
        run = wandb.init(
            project="sb3_hand",
            config=config,
            sync_tensorboard=True,  # auto-upload sb3's tensorboard metrics
            monitor_gym=True,  # auto-upload the videos of agents playing the game
            save_code=True,  # optional
        )

    log = configure(f'results_{job_data.env}')
    # Create the vectorized environment and normalize ob
    env = make_vec_env(job_data.env, n_envs=job_data.n_env)
    env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)

    eval_env = make_vec_env(job_data.env, n_envs=job_data.n_eval_env)
    eval_env = VecNormalize(eval_env, norm_obs=True, norm_reward=False, clip_obs=10.)

    algo = job_data.algorithm
    if algo == 'PPO':
        # Load activation function from config
        policy_kwargs = OmegaConf.to_container(job_data.policy_kwargs, resolve=True)

        model = PPO(job_data.policy, env,  verbose=1,
                    learning_rate=job_data.learning_rate, 
                    batch_size=job_data.batch_size, 
                    policy_kwargs=policy_kwargs,
                    gamma=job_data.gamma, **job_data.alg_hyper_params)
    elif algo == 'SAC':
        model = SAC(job_data.policy, env, 
                    learning_rate=job_data.learning_rate, 
                    buffer_size=job_data.buffer_size, 
                    learning_starts=job_data.learning_starts, 
                    batch_size=job_data.batch_size, 
                    tau=job_data.tau, 
                    gamma=job_data.gamma, **job_data.alg_hyper_params)

    if job_data.job_name =="checkpoint.pt":
        foldername = os.path.join(os.path.dirname(os.path.realpath(__file__)), f"baseline_SB3/myoChal24/{job_data.env}")
        file_path = os.path.join(foldername, job_data.job_name)
        if os.path.isfile(file_path):
            print("Loading weights from checkpoint")
            model.policy.load_state_dict(torch.load(file_path))
        else:
            raise FileNotFoundError(f"No file found at the specified path: {file_path}. See https://github.com/MyoHub/myosuite/blob/dev/myosuite/agents/README.md to download one.")
    else:
        print("No checkpoint loaded, training starts.")

    if IS_WnB_enabled:
        callback = [WandbCallback(
                model_save_path=f"models/{run.id}",
                verbose=2,
            )]
    else:
        callback = []

    callback += [EvalCallback(job_data.eval_freq, eval_env)]
    callback += [InfoCallback()]
    callback += [FallbackCheckpoint(job_data.restore_checkpoint_freq)]
    callback += [CheckpointCallback(save_freq=job_data.save_freq, save_path=f'logs/',
                                            name_prefix='rl_models')]

    model.learn(
        total_timesteps=config["total_timesteps"],
        callback=callback,
    )

    model.set_logger(log)

    model.save(f"{job_data.env}_"+algo+"_model")
    env.save(f'{job_data.env}_'+algo+'_env')

    if IS_WnB_enabled:
        run.finish()

Relevant log output / Error message

Warning: Unused kwargs found: {'render_mode': 'human'}
/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/gymnasium/utils/passive_env_checker.py:135: UserWarning: WARN: The obs returned by the `reset()` method was expecting numpy array dtype to be float32, actual type: float64
  logger.warn(
/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/gymnasium/utils/passive_env_checker.py:159: UserWarning: WARN: The obs returned by the `reset()` method is not within the observation space.
  logger.warn(f"{pre} is not within the observation space.")
UserWarning: WARN: The environment is being initialised with render_mode='human' that is not in the possible render_modes ([]).

Error executing job with overrides: ['env=myoChallengeBimanual-v0', 'job_name=checkpoint.pt']
Traceback (most recent call last):
  File "hydra_sb3_launcher.py", line 36, in configure_jobs
    train_loop(job_data)
  File "/home/lixiaoben/projects/myosuite/myosuite/agents/sb3_job_script.py", line 37, in train_loop
    check_env(env)
  File "/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py", line 473, in check_env
    _check_returned_values(env, observation_space, action_space)
  File "/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py", line 300, in _check_returned_values
    _check_obs(obs, observation_space, "reset")
  File "/root/miniconda3/envs/myosuite/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py", line 219, in _check_obs
    assert np.can_cast(obs.dtype, observation_space.dtype), (
AssertionError: The observation returned by the `reset()` method does not match the data type (cannot cast) of the given observation space Box(-10.0, 10.0, (211,), float32). Expected: float32, actual dtype: float64

System Info

Checklist

XiaobenLi00 commented 3 weeks ago

I have checked the checklist and update information. @araffin

qgallouedec commented 3 weeks ago

As the error message suggests, the observation returned by the reset() method does not match the data type (cannot cast) of the given observation space Box(-10.0, 10.0, (211,), float32). Expected: float32, actual dtype: float64. You need to update the reset function consequently.

(Next time, please use the custom env template.)

XiaobenLi00 commented 3 weeks ago

As the error message suggests, the observation returned by the reset() method does not match the data type (cannot cast) of the given observation space Box(-10.0, 10.0, (211,), float32). Expected: float32, actual dtype: float64. You need to update the reset function consequently.

(Next time, please use the custom env template.)

Thanks a lot for your reply, I will check responding reset() method.

BTW, do you have any idea with the render_mode problem?

araffin commented 3 weeks ago

https://github.com/DLR-RM/stable-baselines3/issues/1968?notification_referrer_id=NT_kwDOAB4evLMxMTUwMDI1MTQxMzoxOTczOTQ4#issuecomment-2295161242

wilhem commented 3 weeks ago

The problem is that in your reset()method, you are returning values which are np.float64. You need to cast them to np.float32 before the reset() method returns.

For instance:

...
phi = phi.astype(np.float32)
theta = theta.astype(np.float32)
psi = psi.astype(np.float32)

obs = {"Phi": phi, "Theta": theta, "Psi": psi}

return obs