Closed isaacncz closed 2 years ago
Hey. It seems like the problem stems from the custom environment you are using. For some reason, doing deepcopy
on the info buffer it returns seems to raise this. This can not really be fixed in stable-baselines3, but you could try creating a wrapper for your environment that returns normal dictionaries and numpy arrays instead of these google protobuf things.
Edit: see answer below ˆˆ
Hello,
the check env is made for gym.Env
environments, not already vectorized one (if you are using isaac Gym).
You should use a VecEnvWrapper to use it with SB3, see https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002
the environment was build using Gym api. Could you provide additional info whether VecEnvWrapper is required?
Really appreciate your input as i found the observation is return with
return gym.spaces.Box(low=min_obs, high=max_obs, dtype=np.float32)
and when i run
model = SAC("MultiInputPolicy", env, verbose=1)
the output was
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell 6' in <cell line: [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000005?line=0)>()
----> 1[ model = SAC("MultiInputPolicy", env, verbose=1)
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:144, in SAC.__init__(self, policy, env, learning_rate, buffer_size, learning_starts, batch_size, tau, gamma, train_freq, gradient_steps, action_noise, replay_buffer_class, replay_buffer_kwargs, optimize_memory_usage, ent_coef, target_update_interval, target_entropy, use_sde, sde_sample_freq, use_sde_at_warmup, tensorboard_log, create_eval_env, policy_kwargs, verbose, seed, device, _init_setup_model)
]()[141](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=140)[ self.ent_coef_optimizer = None
]()[143](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=142)[ if _init_setup_model:
--> ]()[144](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=143)[ self._setup_model()
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:147, in SAC._setup_model(self)
]()[146](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=145)[ def _setup_model(self) -> None:
--> ]()[147](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=146)[ super(SAC, self)._setup_model()
]()[148](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=147)[ self._create_aliases()
]()[149](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=148)[ # Target entropy is used when learning the entropy coefficient
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:216, in OffPolicyAlgorithm._setup_model(self)
]()[205](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=204)[ if self.replay_buffer is None:
]()[206](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=205)[ self.replay_buffer = self.replay_buffer_class(
]()[207](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=206)[ self.buffer_size,
]()[208](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=207)[ self.observation_space,
(...)
]()[213](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=212)[ **self.replay_buffer_kwargs,
]()[214](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=213)[ )
--> ]()[216](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=215)[ self.policy = self.policy_class( # pytype:disable=not-instantiable
]()[217](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=216)[ self.observation_space,
]()[218](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=217)[ self.action_space,
]()[219](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=218)[ self.lr_schedule,
]()[220](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=219)[ **self.policy_kwargs, # pytype:disable=not-instantiable
]()[221](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=220)[ )
]()[222](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=221)[ self.policy = self.policy.to(self.device)
]()[224](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=223)[ # Convert train freq parameter to TrainFreq object
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:498, in MultiInputPolicy.__init__(self, observation_space, action_space, lr_schedule, net_arch, activation_fn, use_sde, log_std_init, sde_net_arch, use_expln, clip_mean, features_extractor_class, features_extractor_kwargs, normalize_images, optimizer_class, optimizer_kwargs, n_critics, share_features_extractor)
]()[478](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=477)[ def __init__(
]()[479](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=478)[ self,
]()[480](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=479)[ observation_space: gym.spaces.Space,
(...)
]()[496](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=495)[ share_features_extractor: bool = True,
]()[497](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=496)[ ):
--> ]()[498](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=497)[ super(MultiInputPolicy, self).__init__(
]()[499](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=498)[ observation_space,
]()[500](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=499)[ action_space,
]()[501](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=500)[ lr_schedule,
]()[502](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=501)[ net_arch,
]()[503](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=502)[ activation_fn,
]()[504](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=503)[ use_sde,
]()[505](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=504)[ log_std_init,
]()[506](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=505)[ sde_net_arch,
]()[507](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=506)[ use_expln,
]()[508](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=507)[ clip_mean,
]()[509](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=508)[ features_extractor_class,
]()[510](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=509)[ features_extractor_kwargs,
]()[511](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=510)[ normalize_images,
]()[512](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=511)[ optimizer_class,
]()[513](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=512)[ optimizer_kwargs,
]()[514](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=513)[ n_critics,
]()[515](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=514)[ share_features_extractor,
]()[516](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=515)[ )
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:292, in SACPolicy.__init__(self, observation_space, action_space, lr_schedule, net_arch, activation_fn, use_sde, log_std_init, sde_net_arch, use_expln, clip_mean, features_extractor_class, features_extractor_kwargs, normalize_images, optimizer_class, optimizer_kwargs, n_critics, share_features_extractor)
]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=288)[ self.critic, self.critic_target = None, None
]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=289)[ self.share_features_extractor = share_features_extractor
--> ]()[292](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=291)[ self._build(lr_schedule)
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:295, in SACPolicy._build(self, lr_schedule)
]()[294](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=293)[ def _build(self, lr_schedule: Schedule) -> None:
--> ]()[295](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=294)[ self.actor = self.make_actor()
]()[296](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=295)[ self.actor.optimizer = self.optimizer_class(self.actor.parameters(), lr=lr_schedule(1), **self.optimizer_kwargs)
]()[298](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=297)[ if self.share_features_extractor:
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:348, in SACPolicy.make_actor(self, features_extractor)
]()[347](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=346)[ def make_actor(self, features_extractor: Optional[BaseFeaturesExtractor] = None) -> Actor:
--> ]()[348](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=347)[ actor_kwargs = self._update_features_extractor(self.actor_kwargs, features_extractor)
]()[349](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=348)[ return Actor(**actor_kwargs).to(self.device)
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py:112, in BaseModel._update_features_extractor(self, net_kwargs, features_extractor)
]()[109](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=108)[ net_kwargs = net_kwargs.copy()
]()[110](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=109)[ if features_extractor is None:
]()[111](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=110)[ # The features extractor is not shared, create a new one
--> ]()[112](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=111)[ features_extractor = self.make_features_extractor()
]()[113](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=112)[ net_kwargs.update(dict(features_extractor=features_extractor, features_dim=features_extractor.features_dim))
]()[114](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=113)[ return net_kwargs
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py:118, in BaseModel.make_features_extractor(self)
]()[116](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=115)[ def make_features_extractor(self) -> BaseFeaturesExtractor:
]()[117](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=116)[ """Helper method to create a features extractor."""
--> ]()[118](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=117)[ return self.features_extractor_class(self.observation_space, **self.features_extractor_kwargs)
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py:258, in CombinedExtractor.__init__(self, observation_space, cnn_output_dim)
]()[255](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=254)[ extractors = {}
]()[257](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=256)[ total_concat_size = 0
--> ]()[258](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=257)[ for key, subspace in observation_space.spaces.items():
]()[259](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=258)[ if is_image_space(subspace):
]()[260](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=259)[ extractors[key] = NatureCNN(subspace, features_dim=cnn_output_dim)
AttributeError: 'Box' object has no attribute 'spaces']()
You should use MlpPolicy
instead of MultiInputPolicy
for Box spaces.
thank you so much for the patience and reply. I tried with MlpPolicy model = SAC("MlpPolicy", env, verbose=1)
and with the reply
Using cuda device Wrapping the env with a `Monitor` wrapper Wrapping the env in a DummyVecEnv.
However, i still cant perform model.learn(total_timesteps=10000)
TypeError Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000008?line=0)3' in <cell line: 1>()
----> 1[ model.learn(total_timesteps=10000)
File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:292, in SAC.learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
]()[279](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=278)[ def learn(
]()[280](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=279)[ self,
]()[281](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=280)[ total_timesteps: int,
(...)
]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=288)[ reset_num_timesteps: bool = True,
]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=289)[ ) -> OffPolicyAlgorithm:
--> ]()[292](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=291)[ return super(SAC, self).learn(
]()[293](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=292)[ total_timesteps=total_timesteps,
]()[294](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=293)[ callback=callback,
]()[295](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=294)[ log_interval=log_interval,
]()[296](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=295)[ eval_env=eval_env,
]()[297](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=296)[ eval_freq=eval_freq,
]()[298](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=297)[ n_eval_episodes=n_eval_episodes,
]()[299](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=298)[ tb_log_name=tb_log_name,
]()[300](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=299)[ eval_log_path=eval_log_path,
]()[301](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=300)[ reset_num_timesteps=reset_num_timesteps,
]()[302](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=301)[ )
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:347, in OffPolicyAlgorithm.learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
]()[344](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=343)[ callback.on_training_start(locals(), globals())
]()[346](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=345)[ while self.num_timesteps < total_timesteps:
--> ]()[347](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=346)[ rollout = self.collect_rollouts(
]()[348](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=347)[ self.env,
]()[349](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=348)[ train_freq=self.train_freq,
]()[350](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=349)[ action_noise=self.action_noise,
]()[351](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=350)[ callback=callback,
]()[352](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=351)[ learning_starts=self.learning_starts,
]()[353](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=352)[ replay_buffer=self.replay_buffer,
]()[354](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=353)[ log_interval=log_interval,
]()[355](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=354)[ )
]()[357](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=356)[ if rollout.continue_training is False:
]()[358](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=357)[ break
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:580, in OffPolicyAlgorithm.collect_rollouts(self, env, callback, train_freq, replay_buffer, action_noise, learning_starts, log_interval)
]()[577](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=576)[ actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
]()[579](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=578)[ # Rescale and perform action
--> ]()[580](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=579)[ new_obs, rewards, dones, infos = env.step(actions)
]()[582](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=581)[ self.num_timesteps += env.num_envs
]()[583](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=582)[ num_collected_steps += 1
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:162, in VecEnv.step(self, actions)
]()[155](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=154)[ """
]()[156](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=155)[ Step the environments with the given action
]()[157](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=156)[
]()[158](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=157)[ :param actions: the action
]()[159](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=158)[ :return: observation, reward, done, information
]()[160](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=159)[ """
]()[161](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=160)[ self.step_async(actions)
--> ]()[162](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=161)[ return self.step_wait()
File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:51, in DummyVecEnv.step_wait(self)
]()[49](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=48)[ obs = self.envs[env_idx].reset()
]()[50](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=49)[ self._save_obs(env_idx, obs)
---> ]()[51](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=50)[ return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))
File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[ y = copier(x, memo)
]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
]()[148](file:///usr/lib/python3.8/copy.py?line=147)[ if issubclass(cls, type):
File /usr/lib/python3.8/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
]()[203](file:///usr/lib/python3.8/copy.py?line=202)[ append = y.append
]()[204](file:///usr/lib/python3.8/copy.py?line=203)[ for a in x:
--> ]()[205](file:///usr/lib/python3.8/copy.py?line=204)[ append(deepcopy(a, memo))
]()[206](file:///usr/lib/python3.8/copy.py?line=205)[ return y
File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[ y = copier(x, memo)
]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
]()[148](file:///usr/lib/python3.8/copy.py?line=147)[ if issubclass(cls, type):
File /usr/lib/python3.8/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
]()[228](file:///usr/lib/python3.8/copy.py?line=227)[ memo[id(x)] = y
]()[229](file:///usr/lib/python3.8/copy.py?line=228)[ for key, value in x.items():
--> ]()[230](file:///usr/lib/python3.8/copy.py?line=229)[ y[deepcopy(key, memo)] = deepcopy(value, memo)
]()[231](file:///usr/lib/python3.8/copy.py?line=230)[ return y
File /usr/lib/python3.8/copy.py:161, in deepcopy(x, memo, _nil)
]()[159](file:///usr/lib/python3.8/copy.py?line=158)[ reductor = getattr(x, "__reduce_ex__", None)
]()[160](file:///usr/lib/python3.8/copy.py?line=159)[ if reductor is not None:
--> ]()[161](file:///usr/lib/python3.8/copy.py?line=160)[ rv = reductor(4)
]()[162](file:///usr/lib/python3.8/copy.py?line=161)[ else:
]()[163](file:///usr/lib/python3.8/copy.py?line=162)[ reductor = getattr(x, "__reduce__", None)
TypeError: cannot pickle 'google.protobuf.pyext._message.ScalarMapContainer' object]()
you are passing something in the info dict that is not pickable, please remove it or convert it (the error with the env checker was the same). closing as we don't do tech support.
Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
🤖 Custom Gym Environment
Please check your environment first using:
 Describe the bug
A clear and concise description of what the bug is. Having issue with check_env with this custom environment
The observation space: Box([ -inf -inf -inf -1.1 -1.1 -1.1 -1.1 -1.1 -1.1 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -1.01 -1.01 -1.01 -1.01 -1.01 -1.01], [ inf inf inf 1.1 1.1 1.1 1.1 1.1 1.1 inf inf inf inf inf inf inf inf inf inf inf inf 1.01 1.01 1.01 1.01 1.01 1.01], (27,), float32) The action space: Box([-1. -1. -1. -1. -1.], [1. 1. 1. 1. 1.], (5,), float32)
 Code example
Please try to provide a minimal example to reproduce the bug.
I was running the example here. https://github.com/jr-robotics/robo-gym/blob/master/docs/environments.md#end-effector-positioning
For a custom environment, you need to give at least the observation space, action space,
reset()
andstep()
methods (see working example below). Error messages and stack traces are also helpful.Please use the markdown code blocks for both code and stack traces.
 System Info
Describe the characteristic of your environment:
You can use
sb3.get_system_info()
to print relevant packages info:Additional context
Add any other context about the problem here.
 Checklist