Closed CreativeNick closed 4 months ago
In cube_env.py, I edited the evaluate function such that:
New function:
def evaluate(self):
if self.initialized:
right_hand_link_z = torch.concat(
[
link.pose.p[..., 2].unsqueeze(1)
for link in self.right_hand_link + self.right_hand_tip_link
],
dim=1,
)
fail_collision_table = (right_hand_link_z < self.table_height + 0.04).any(
dim=1
)
fail_cube_fall = self.cube.pose.p[:, 2] < 1.0
fail = fail_collision_table | fail_cube_fall
else:
fail = torch.zeros_like(self.cube.pose.p[:, 0], dtype=torch.bool)
# Adjusted success condition for easier achievement
success = self.cube.pose.p[:, 2] >= 1.0
state = {"success": success, "fail": fail}
# Ensure there's always at least one success or default value
if not success.any():
state["success"] = torch.zeros_like(success)
state["fail"] = torch.ones_like(fail)
return state
I was able to run the command python ppo.py --no-capture-video --env-id Bimanual_Allegro_Cube
, but when I tried evaluating the final checkpoint, I received the following error:
(ms_dev) creativenick@creativenick:~/Desktop/SimToReal/bimanual-sapien$ python ppo.py --evaluate --no-capture-video --checkpoint runs/Bimanual_Allegro_Cube__ppo__1__1717740309/final_ckpt.pt
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/tyro/_fields.py:343: UserWarning: The field wandb_entity is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
warnings.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/tyro/_fields.py:343: UserWarning: The field checkpoint is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
warnings.warn(
Running evaluation
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_observation_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_observation_space` for environment variables or `env.get_wrapper_attr('single_observation_space')` that will search the reminding wrappers.
logger.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_action_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_action_space` for environment variables or `env.get_wrapper_attr('single_action_space')` that will search the reminding wrappers.
logger.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.max_episode_steps to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.max_episode_steps` for environment variables or `env.get_wrapper_attr('max_episode_steps')` that will search the reminding wrappers.
logger.warn(
####
args.num_iterations=390 args.num_envs=512 args.num_eval_envs=2
args.minibatch_size=800 args.batch_size=25600 args.update_epochs=4
####
Traceback (most recent call last):
File "/home/creativenick/Desktop/SimToReal/bimanual-sapien/ppo.py", line 315, in <module>
agent.load_state_dict(torch.load(args.checkpoint))
File "/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Agent:
size mismatch for actor_logstd: copying a param with shape torch.Size([1, 44]) from checkpoint, the shape in current model is torch.Size([1, 8]).
size mismatch for critic.0.weight: copying a param with shape torch.Size([256, 98]) from checkpoint, the shape in current model is torch.Size([256, 42]).
size mismatch for actor_mean.0.weight: copying a param with shape torch.Size([256, 98]) from checkpoint, the shape in current model is torch.Size([256, 42]).
size mismatch for actor_mean.6.weight: copying a param with shape torch.Size([44, 256]) from checkpoint, the shape in current model is torch.Size([8, 256]).
size mismatch for actor_mean.6.bias: copying a param with shape torch.Size([44]) from checkpoint, the shape in current model is torch.Size([8]).
The evaluation success rate (eval_success_rate), evaluation fail rate (eval_fail_rate), fail rate (fail_rate), learning rate (learning_rate), and many other stats were all completely flat.
I also attached the folder for this run below: Bimanual_Allegro_Cubeppo1__1717740309.zip
The error is raised because max episode length in the defined environment is larger than num_steps in ppo.py
script.
Issue
When attempting to run
ppo.py
to train the RL model using oncube_env.py
or the Bimanual_Allegro_Cube env, I get an empty array error during Epoch 1 of the iteration loop inppo.py
where the loop attempts to concatenate the "returns" array.Possible cause / solution
The "returns" array is the sum of rewards, the cause of the initial array being empty could be due to a bug in the
compute_dense_reward
function. This function is inside all env files, includingcube_env.py
. There could also be a bug in theevaluate
function (also inppo.py
), which checks the current state of the environment and determine if the robot has successfully performed the assigned task.Command ran:
Terminal output: