Closed tudorjnu closed 1 year ago
Hi,
If I understand correctly, your observation space looks something like:
observation_space = Dict({
"observation": Dict({
"joint_pos": Box(...),
"joint_vel": Box(...),
"camera": Box(...), # image
}),
"desired_goal": Box(...),
"achieved_goal": Box(...),
})
Is this correct? If so, I'm pretty sure that turning it to
observation_space = Dict({
"joint_pos": Box(...),
"joint_vel": Box(...),
"camera": Box(...), # image
"desired_goal": Box(...),
"achieved_goal": Box(...),
})
and using the branch from #704 should work. Please keep me posted.
Is this correct? If so, I'm pretty sure that turning it to
for using it with HER, you might also need to merge all the observations into one box (@qgallouedec I'm not sure if we are using the "observation"
explicitly or not now).
In case of doubt, please use the env checker (it should tell you in that case that SB3 doesn't support nested dict).
to merge all the observations into one box
Caution, here you have an image, so merging could be detrimental as the observation won't be preprocessed as an image.
I'm not sure if we are using the "observation" explicitly or not now
I'm pretty sure that we don't. If my solution works for @tudorjnu, I recommend removing "observation" here
so merging could be detrimental as the observation won't be preprocessed as an image.
unless you merge along the channel axis, no?
If my solution works for @tudorjnu, I recommend removing "observation" here
true.
unless you merge along the channel axis, no?
From what I understand, the observations are multimodals. How would merge a box with shape, let's say (3, 84, 84)
(image) with one with shape (7,)
(joints position)?
From what I understand, the observations are multimodals. How would merge a box with shape, let's say (3, 84, 84) (image) with one with shape (7,) (joints position)?
true, I misread. Yes, that's what the MultiInputPolicy
is for.
Hello all and thank you for the fast responses!
Yes, @qgallouedec, that is correct. My observation space is indeed
observation_space = Dict({
"observation": Dict({
"joint_pos": Box(...),
"joint_vel": Box(...),
"camera": Box(...), # image
}),
"desired_goal": Box(...),
"achieved_goal": Box(...),
})
so merging as specified above is not a solution.
As a quick update, I have tried the solution
observation_space = Dict({
"joint_pos": Box(...),
"joint_vel": Box(...),
"camera": Box(...), # image
"desired_goal": Box(...),
"achieved_goal": Box(...),
})
and I get the following error (I took the image out just to simplify the problem):
AssertionError: A goal conditioned env must contain 3 observation keys: `observation`, `desired_goal`, and `achieved_goal`.The current observation contains 4 keys: ['joint_pos', 'joint_vel', 'achieved_goal', 'desired_goal']
Running the environment yields a key error:
self.obs_shape = get_obs_shape(self.env.observation_space.spaces["observation"])
KeyError: 'observation'
Should I wait for the merge? Thank you again! :)
I used the branch feat/multienv-her-alt
from git@github.com:qgallouedec/stable-baselines3.git
and I get the following error, although it seems it creates the environment:
/her_replay_buffer.py", line 174, in sample
sampled_idx = np.random.choice(valid_indices, size=batch_size, replace=True)
File "mtrand.pyx", line 934, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
Sorry, I just realised that before I cloned the wrong main
branch instead of this one.
I think this error is actually not related to the subject of this issue. It is most likely that you are trying to train the model before the completion of the first episode. You should be able to solve this error by increasing the learning_starts
argument of the model. I thought I had already dealt with this in another issue but I can't find it...
If the error remains, please open another issue, in your case with the custom env template, it allows us to keep the topics well organized.
Thanks! I set learning_starts
at 600 as my environment has a time limit of 300. The error mentioned is solved, however, I get the following:
sac.py", line 245, in train
target_q_values = replay_data.rewards + (1 - replay_data.dones) * self.gamma * next_q_values
RuntimeError: The size of tensor a (53) must match the size of tensor b (256) at non-singleton dimension 0
Can you provide a minimal code to reproduce this? (I recommend using the custom env template)
Sure, no problem. I created this super simple environment that gives me the same error:
import numpy as np
from stable_baselines3 import HerReplayBuffer, SAC
import gym
from gym import spaces
class TestEnv(gym.GoalEnv):
def __init__(self):
self.observation_space = spaces.Dict(
dict(
an_observation_1=spaces.Box(0, 1, (1,), dtype=np.float32),
an_observation_2=spaces.Box(0, 1, (1,), dtype=np.float32),
achieved_goal=spaces.Box(0, 1, (1,), dtype=np.float32),
desired_goal=spaces.Box(0, 1, (1,), dtype=np.float32),
)
)
self.action_space = spaces.Box(0, 1, (1,), dtype=np.float32)
self.current_step = 0
self.ep_length = 10
def reset(self):
self.current_step = 0
state = self._generate_next_state()
return state
def step(self, action):
obs = self._generate_next_state()
self.current_step += 1
done = self.current_step >= self.ep_length
return obs, -1, done, {}
def _generate_next_state(self):
state = {}
for k, space in self.observation_space.spaces.items():
state[k] = space.sample()
return state
def render(self, mode: str = "human") -> None:
pass
def compute_reward(self, achieved_goal, desired_goal, info):
return np.array(-1, np.float32)
env = TestEnv()
model = SAC(
"MultiInputPolicy",
env,
replay_buffer_class=HerReplayBuffer,
learning_starts=1000,
verbose=1,
)
model.learn(3000, progress_bar=True)
Hello and thanks for the commit!
I just wanted to mention that the top code is still not working. Thanks!
I just wanted to mention that the top code is still not working. Thanks!
which one? the one with nested observation? If you use the env checker, it should explain why it is not working.
This one:
sac.py", line 245, in train target_q_values = replay_data.rewards + (1 - replay_data.dones) * self.gamma * next_q_values RuntimeError: The size of tensor a (53) must match the size of tensor b (256) at non-singleton dimension 0
Which makes me realize I should write in the other issue. Sorry for that.
Also, the env_checker
will still output an error for the 'observation' not in keys from here. If the solution from the dictionary above is to be implemented, I reckon the verification can be >=
3 rather than ==
.
❓ Question
Hello,
I am looking to use my custom environment with HER. Currently, it works perfectly with the normal environment and it passes all the checks. The observation space is, however, a dict, as it contains multiple images, joint pos, joint vel and so on. In order to wrap the environment, I added the required method (
compute_reward
) and I modified the observation space by creating a newDict
observation space with the keysobservation
,achieved_goal
anddesired_goal
. In doing so, byobservation
value is basically anotherDict
.Is there any way to use this
MultiInput
kind of policy or is it something not currently supported now? I was also considering making my own custom policy in order to go around the issue. Thank you!Checklist