DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.67k stars 1.65k forks source link

robo-gym check env issue #866

Closed isaacncz closed 2 years ago

isaacncz commented 2 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

🤖 Custom Gym Environment

Please check your environment first using:

from stable_baselines3.common.env_checker import check_env

env = gym.make('EndEffectorPositioningURSim-v0', ip=target_machine_ip, gui=True)
# It will check your custom environment and output additional warnings if needed
check_env(env)

 Describe the bug

A clear and concise description of what the bug is. Having issue with check_env with this custom environment

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell 3' in <cell line: [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000008?line=0)>()
----> 1[ check_env(env)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py:291, in check_env(env, warn, skip_render_check)
    ]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=288)[ # The check only works with numpy arrays
    ]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=289)[ if _is_numpy_array_space(observation_space) and _is_numpy_array_space(action_space):
--> ]()[291](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=290)[     _check_nan(env)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py:93, in _check_nan(env)
     ]()[91](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=90)[ for _ in range(10):
     ]()[92](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=91)[     action = np.array([env.action_space.sample()])
---> ]()[93](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=92)[     _, _, _, _ = vec_env.step(action)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:162, in VecEnv.step(self, actions)
    ]()[155](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=154)[ """
    ]()[156](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=155)[ Step the environments with the given action
    ]()[157](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=156)[ 
    ]()[158](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=157)[ :param actions: the action
    ]()[159](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=158)[ :return: observation, reward, done, information
    ]()[160](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=159)[ """
    ]()[161](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=160)[ self.step_async(actions)
--> ]()[162](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=161)[ return self.step_wait()

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py:35, in VecCheckNan.step_wait(self)
     ]()[34](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=33)[ def step_wait(self) -> VecEnvStepReturn:
---> ]()[35](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=34)[     observations, rewards, news, infos = self.venv.step_wait()
     ]()[37](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=36)[     self._check_val(async_step=False, observations=observations, rewards=rewards, news=news)
     ]()[39](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=38)[     self._observations = observations

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:51, in DummyVecEnv.step_wait(self)
     ]()[49](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=48)[         obs = self.envs[env_idx].reset()
     ]()[50](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=49)[     self._save_obs(env_idx, obs)
---> ]()[51](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=50)[ return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
    ]()[203](file:///usr/lib/python3.8/copy.py?line=202)[ append = y.append
    ]()[204](file:///usr/lib/python3.8/copy.py?line=203)[ for a in x:
--> ]()[205](file:///usr/lib/python3.8/copy.py?line=204)[     append(deepcopy(a, memo))
    ]()[206](file:///usr/lib/python3.8/copy.py?line=205)[ return y

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
    ]()[228](file:///usr/lib/python3.8/copy.py?line=227)[ memo[id(x)] = y
    ]()[229](file:///usr/lib/python3.8/copy.py?line=228)[ for key, value in x.items():
--> ]()[230](file:///usr/lib/python3.8/copy.py?line=229)[     y[deepcopy(key, memo)] = deepcopy(value, memo)
    ]()[231](file:///usr/lib/python3.8/copy.py?line=230)[ return y

File /usr/lib/python3.8/copy.py:161, in deepcopy(x, memo, _nil)
    ]()[159](file:///usr/lib/python3.8/copy.py?line=158)[ reductor = getattr(x, "__reduce_ex__", None)
    ]()[160](file:///usr/lib/python3.8/copy.py?line=159)[ if reductor is not None:
--> ]()[161](file:///usr/lib/python3.8/copy.py?line=160)[     rv = reductor(4)
    ]()[162](file:///usr/lib/python3.8/copy.py?line=161)[ else:
    ]()[163](file:///usr/lib/python3.8/copy.py?line=162)[     reductor = getattr(x, "__reduce__", None)

TypeError: cannot pickle 'google.protobuf.pyext._message.ScalarMapContainer' object]()

The observation space: Box([ -inf -inf -inf -1.1 -1.1 -1.1 -1.1 -1.1 -1.1 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -1.01 -1.01 -1.01 -1.01 -1.01 -1.01], [ inf inf inf 1.1 1.1 1.1 1.1 1.1 1.1 inf inf inf inf inf inf inf inf inf inf inf inf 1.01 1.01 1.01 1.01 1.01 1.01], (27,), float32) The action space: Box([-1. -1. -1. -1. -1.], [1. 1. 1. 1. 1.], (5,), float32)

 Code example

import gym
import robo_gym
from robo_gym.wrappers.exception_handling import ExceptionHandling

import stable_baselines3 as sb3
from stable_baselines3 import SAC,PPO

from stable_baselines3.common.env_checker import check_env
check_env(env)

Please try to provide a minimal example to reproduce the bug.

I was running the example here. https://github.com/jr-robotics/robo-gym/blob/master/docs/environments.md#end-effector-positioning

For a custom environment, you need to give at least the observation space, action space, reset() and step() methods (see working example below). Error messages and stack traces are also helpful.

Please use the markdown code blocks for both code and stack traces.

import gym
import numpy as np

from stable_baselines3 import A2C
from stable_baselines3.common.env_checker import check_env

class CustomEnv(gym.Env):

  def __init__(self):
    super(CustomEnv, self).__init__()
    self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(14,))
    self.action_space = gym.spaces.Box(low=-1, high=1, shape=(6,))

  def reset(self):
    return self.observation_space.sample()

  def step(self, action):
    obs = self.observation_space.sample()
    reward = 1.0
    done = False
    info = {}
    return obs, reward, done, info

env = CustomEnv()
check_env(env)

model = A2C("MlpPolicy", env, verbose=1).learn(1000)
Traceback (most recent call last): File ...

 System Info

Describe the characteristic of your environment:

You can use sb3.get_system_info() to print relevant packages info:

import stable_baselines3 as sb3
sb3.get_system_info()

OS: Linux-5.13.0-39-generic-x86_64-with-glibc2.29 #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022
Python: 3.8.10
Stable-Baselines3: 1.5.0
PyTorch: 1.11.0+cu113
GPU Enabled: True
Numpy: 1.20.0
Gym: 0.21.0

({'OS': 'Linux-5.13.0-39-generic-x86_64-with-glibc2.29 #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022',
  'Python': '3.8.10',
  'Stable-Baselines3': '1.5.0',
  'PyTorch': '1.11.0+cu113',
  'GPU Enabled': 'True',
  'Numpy': '1.20.0',
  'Gym': '0.21.0'},
 'OS: Linux-5.13.0-39-generic-x86_64-with-glibc2.29 #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022\nPython: 3.8.10\nStable-Baselines3: 1.5.0\nPyTorch: 1.11.0+cu113\nGPU Enabled: True\nNumpy: 1.20.0\nGym: 0.21.0\n')

Additional context

Add any other context about the problem here.

 Checklist

Miffyli commented 2 years ago

Hey. It seems like the problem stems from the custom environment you are using. For some reason, doing deepcopy on the info buffer it returns seems to raise this. This can not really be fixed in stable-baselines3, but you could try creating a wrapper for your environment that returns normal dictionaries and numpy arrays instead of these google protobuf things.

Edit: see answer below ˆˆ

araffin commented 2 years ago

Hello, the check env is made for gym.Env environments, not already vectorized one (if you are using isaac Gym). You should use a VecEnvWrapper to use it with SB3, see https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002

isaacncz commented 2 years ago

the environment was build using Gym api. Could you provide additional info whether VecEnvWrapper is required? Really appreciate your input as i found the observation is return with return gym.spaces.Box(low=min_obs, high=max_obs, dtype=np.float32)

and when i run model = SAC("MultiInputPolicy", env, verbose=1)

the output was

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell 6' in <cell line: [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000005?line=0)>()
----> 1[ model = SAC("MultiInputPolicy", env, verbose=1)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:144, in SAC.__init__(self, policy, env, learning_rate, buffer_size, learning_starts, batch_size, tau, gamma, train_freq, gradient_steps, action_noise, replay_buffer_class, replay_buffer_kwargs, optimize_memory_usage, ent_coef, target_update_interval, target_entropy, use_sde, sde_sample_freq, use_sde_at_warmup, tensorboard_log, create_eval_env, policy_kwargs, verbose, seed, device, _init_setup_model)
    ]()[141](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=140)[ self.ent_coef_optimizer = None
    ]()[143](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=142)[ if _init_setup_model:
--> ]()[144](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=143)[     self._setup_model()

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:147, in SAC._setup_model(self)
    ]()[146](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=145)[ def _setup_model(self) -> None:
--> ]()[147](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=146)[     super(SAC, self)._setup_model()
    ]()[148](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=147)[     self._create_aliases()
    ]()[149](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=148)[     # Target entropy is used when learning the entropy coefficient

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:216, in OffPolicyAlgorithm._setup_model(self)
    ]()[205](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=204)[ if self.replay_buffer is None:
    ]()[206](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=205)[     self.replay_buffer = self.replay_buffer_class(
    ]()[207](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=206)[         self.buffer_size,
    ]()[208](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=207)[         self.observation_space,
   (...)
    ]()[213](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=212)[         **self.replay_buffer_kwargs,
    ]()[214](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=213)[     )
--> ]()[216](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=215)[ self.policy = self.policy_class(  # pytype:disable=not-instantiable
    ]()[217](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=216)[     self.observation_space,
    ]()[218](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=217)[     self.action_space,
    ]()[219](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=218)[     self.lr_schedule,
    ]()[220](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=219)[     **self.policy_kwargs,  # pytype:disable=not-instantiable
    ]()[221](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=220)[ )
    ]()[222](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=221)[ self.policy = self.policy.to(self.device)
    ]()[224](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=223)[ # Convert train freq parameter to TrainFreq object

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:498, in MultiInputPolicy.__init__(self, observation_space, action_space, lr_schedule, net_arch, activation_fn, use_sde, log_std_init, sde_net_arch, use_expln, clip_mean, features_extractor_class, features_extractor_kwargs, normalize_images, optimizer_class, optimizer_kwargs, n_critics, share_features_extractor)
    ]()[478](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=477)[ def __init__(
    ]()[479](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=478)[     self,
    ]()[480](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=479)[     observation_space: gym.spaces.Space,
   (...)
    ]()[496](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=495)[     share_features_extractor: bool = True,
    ]()[497](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=496)[ ):
--> ]()[498](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=497)[     super(MultiInputPolicy, self).__init__(
    ]()[499](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=498)[         observation_space,
    ]()[500](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=499)[         action_space,
    ]()[501](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=500)[         lr_schedule,
    ]()[502](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=501)[         net_arch,
    ]()[503](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=502)[         activation_fn,
    ]()[504](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=503)[         use_sde,
    ]()[505](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=504)[         log_std_init,
    ]()[506](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=505)[         sde_net_arch,
    ]()[507](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=506)[         use_expln,
    ]()[508](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=507)[         clip_mean,
    ]()[509](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=508)[         features_extractor_class,
    ]()[510](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=509)[         features_extractor_kwargs,
    ]()[511](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=510)[         normalize_images,
    ]()[512](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=511)[         optimizer_class,
    ]()[513](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=512)[         optimizer_kwargs,
    ]()[514](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=513)[         n_critics,
    ]()[515](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=514)[         share_features_extractor,
    ]()[516](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=515)[     )

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:292, in SACPolicy.__init__(self, observation_space, action_space, lr_schedule, net_arch, activation_fn, use_sde, log_std_init, sde_net_arch, use_expln, clip_mean, features_extractor_class, features_extractor_kwargs, normalize_images, optimizer_class, optimizer_kwargs, n_critics, share_features_extractor)
    ]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=288)[ self.critic, self.critic_target = None, None
    ]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=289)[ self.share_features_extractor = share_features_extractor
--> ]()[292](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=291)[ self._build(lr_schedule)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:295, in SACPolicy._build(self, lr_schedule)
    ]()[294](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=293)[ def _build(self, lr_schedule: Schedule) -> None:
--> ]()[295](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=294)[     self.actor = self.make_actor()
    ]()[296](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=295)[     self.actor.optimizer = self.optimizer_class(self.actor.parameters(), lr=lr_schedule(1), **self.optimizer_kwargs)
    ]()[298](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=297)[     if self.share_features_extractor:

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:348, in SACPolicy.make_actor(self, features_extractor)
    ]()[347](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=346)[ def make_actor(self, features_extractor: Optional[BaseFeaturesExtractor] = None) -> Actor:
--> ]()[348](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=347)[     actor_kwargs = self._update_features_extractor(self.actor_kwargs, features_extractor)
    ]()[349](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=348)[     return Actor(**actor_kwargs).to(self.device)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py:112, in BaseModel._update_features_extractor(self, net_kwargs, features_extractor)
    ]()[109](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=108)[ net_kwargs = net_kwargs.copy()
    ]()[110](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=109)[ if features_extractor is None:
    ]()[111](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=110)[     # The features extractor is not shared, create a new one
--> ]()[112](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=111)[     features_extractor = self.make_features_extractor()
    ]()[113](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=112)[ net_kwargs.update(dict(features_extractor=features_extractor, features_dim=features_extractor.features_dim))
    ]()[114](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=113)[ return net_kwargs

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py:118, in BaseModel.make_features_extractor(self)
    ]()[116](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=115)[ def make_features_extractor(self) -> BaseFeaturesExtractor:
    ]()[117](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=116)[     """Helper method to create a features extractor."""
--> ]()[118](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=117)[     return self.features_extractor_class(self.observation_space, **self.features_extractor_kwargs)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py:258, in CombinedExtractor.__init__(self, observation_space, cnn_output_dim)
    ]()[255](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=254)[ extractors = {}
    ]()[257](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=256)[ total_concat_size = 0
--> ]()[258](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=257)[ for key, subspace in observation_space.spaces.items():
    ]()[259](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=258)[     if is_image_space(subspace):
    ]()[260](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=259)[         extractors[key] = NatureCNN(subspace, features_dim=cnn_output_dim)

AttributeError: 'Box' object has no attribute 'spaces']()
Miffyli commented 2 years ago

You should use MlpPolicy instead of MultiInputPolicy for Box spaces.

isaacncz commented 2 years ago

thank you so much for the patience and reply. I tried with MlpPolicy model = SAC("MlpPolicy", env, verbose=1) and with the reply Using cuda device Wrapping the env with a `Monitor` wrapper Wrapping the env in a DummyVecEnv.

However, i still cant perform model.learn(total_timesteps=10000)

TypeError                                 Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000008?line=0)3' in <cell line: 1>()
----> 1[ model.learn(total_timesteps=10000)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:292, in SAC.learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    ]()[279](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=278)[ def learn(
    ]()[280](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=279)[     self,
    ]()[281](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=280)[     total_timesteps: int,
   (...)
    ]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=288)[     reset_num_timesteps: bool = True,
    ]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=289)[ ) -> OffPolicyAlgorithm:
--> ]()[292](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=291)[     return super(SAC, self).learn(
    ]()[293](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=292)[         total_timesteps=total_timesteps,
    ]()[294](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=293)[         callback=callback,
    ]()[295](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=294)[         log_interval=log_interval,
    ]()[296](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=295)[         eval_env=eval_env,
    ]()[297](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=296)[         eval_freq=eval_freq,
    ]()[298](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=297)[         n_eval_episodes=n_eval_episodes,
    ]()[299](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=298)[         tb_log_name=tb_log_name,
    ]()[300](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=299)[         eval_log_path=eval_log_path,
    ]()[301](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=300)[         reset_num_timesteps=reset_num_timesteps,
    ]()[302](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=301)[     )

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:347, in OffPolicyAlgorithm.learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    ]()[344](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=343)[ callback.on_training_start(locals(), globals())
    ]()[346](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=345)[ while self.num_timesteps < total_timesteps:
--> ]()[347](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=346)[     rollout = self.collect_rollouts(
    ]()[348](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=347)[         self.env,
    ]()[349](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=348)[         train_freq=self.train_freq,
    ]()[350](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=349)[         action_noise=self.action_noise,
    ]()[351](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=350)[         callback=callback,
    ]()[352](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=351)[         learning_starts=self.learning_starts,
    ]()[353](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=352)[         replay_buffer=self.replay_buffer,
    ]()[354](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=353)[         log_interval=log_interval,
    ]()[355](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=354)[     )
    ]()[357](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=356)[     if rollout.continue_training is False:
    ]()[358](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=357)[         break

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:580, in OffPolicyAlgorithm.collect_rollouts(self, env, callback, train_freq, replay_buffer, action_noise, learning_starts, log_interval)
    ]()[577](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=576)[ actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
    ]()[579](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=578)[ # Rescale and perform action
--> ]()[580](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=579)[ new_obs, rewards, dones, infos = env.step(actions)
    ]()[582](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=581)[ self.num_timesteps += env.num_envs
    ]()[583](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=582)[ num_collected_steps += 1

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:162, in VecEnv.step(self, actions)
    ]()[155](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=154)[ """
    ]()[156](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=155)[ Step the environments with the given action
    ]()[157](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=156)[ 
    ]()[158](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=157)[ :param actions: the action
    ]()[159](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=158)[ :return: observation, reward, done, information
    ]()[160](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=159)[ """
    ]()[161](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=160)[ self.step_async(actions)
--> ]()[162](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=161)[ return self.step_wait()

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:51, in DummyVecEnv.step_wait(self)
     ]()[49](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=48)[         obs = self.envs[env_idx].reset()
     ]()[50](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=49)[     self._save_obs(env_idx, obs)
---> ]()[51](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=50)[ return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
    ]()[203](file:///usr/lib/python3.8/copy.py?line=202)[ append = y.append
    ]()[204](file:///usr/lib/python3.8/copy.py?line=203)[ for a in x:
--> ]()[205](file:///usr/lib/python3.8/copy.py?line=204)[     append(deepcopy(a, memo))
    ]()[206](file:///usr/lib/python3.8/copy.py?line=205)[ return y

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
    ]()[228](file:///usr/lib/python3.8/copy.py?line=227)[ memo[id(x)] = y
    ]()[229](file:///usr/lib/python3.8/copy.py?line=228)[ for key, value in x.items():
--> ]()[230](file:///usr/lib/python3.8/copy.py?line=229)[     y[deepcopy(key, memo)] = deepcopy(value, memo)
    ]()[231](file:///usr/lib/python3.8/copy.py?line=230)[ return y

File /usr/lib/python3.8/copy.py:161, in deepcopy(x, memo, _nil)
    ]()[159](file:///usr/lib/python3.8/copy.py?line=158)[ reductor = getattr(x, "__reduce_ex__", None)
    ]()[160](file:///usr/lib/python3.8/copy.py?line=159)[ if reductor is not None:
--> ]()[161](file:///usr/lib/python3.8/copy.py?line=160)[     rv = reductor(4)
    ]()[162](file:///usr/lib/python3.8/copy.py?line=161)[ else:
    ]()[163](file:///usr/lib/python3.8/copy.py?line=162)[     reductor = getattr(x, "__reduce__", None)

TypeError: cannot pickle 'google.protobuf.pyext._message.ScalarMapContainer' object]()
araffin commented 2 years ago

you are passing something in the info dict that is not pickable, please remove it or convert it (the error with the env checker was the same). closing as we don't do tech support.