[Bug/Error] Concatenating EMPTY array "returns" during first epoch

Issue

When attempting to run ppo.py to train the RL model using on cube_env.py or the Bimanual_Allegro_Cube env, I get an empty array error during Epoch 1 of the iteration loop in ppo.py where the loop attempts to concatenate the "returns" array.

Possible cause / solution

The "returns" array is the sum of rewards, the cause of the initial array being empty could be due to a bug in the compute_dense_reward function. This function is inside all env files, including cube_env.py. There could also be a bug in the evaluate function (also in ppo.py), which checks the current state of the environment and determine if the robot has successfully performed the assigned task.

Command ran:

python ppo.py --no-capture-video --env-id Bimanual_Allegro_Cube

Terminal output:

(ms_dev) creativenick@creativenick:~/Desktop/SimToReal/bimanual-sapien$ python ppo.py --no-capture-video --env-id Bimanual_Allegro_Cube
        /home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/tyro/_fields.py:343: UserWarning: The field wandb_entity is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/tyro/_fields.py:343: UserWarning: The field checkpoint is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
Running training
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_observation_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_observation_space` for environment variables or `env.get_wrapper_attr('single_observation_space')` that will search the reminding wrappers.
  logger.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_action_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_action_space` for environment variables or `env.get_wrapper_attr('single_action_space')` that will search the reminding wrappers.
  logger.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.max_episode_steps to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.max_episode_steps` for environment variables or `env.get_wrapper_attr('max_episode_steps')` that will search the reminding wrappers.
  logger.warn(
####
args.num_iterations=390 args.num_envs=512 args.num_eval_envs=2
args.minibatch_size=800 args.batch_size=25600 args.update_epochs=4
####
Epoch: 1, global_step=0
Evaluating
Traceback (most recent call last):.09 hand_close: 0.20
  File "/home/creativenick/Desktop/SimToReal/bimanual-sapien/ppo.py", line 355, in <module>
    returns = np.concatenate(returns)
ValueError: need at least one array to concatenate

In cube_env.py, I edited the evaluate function such that:

The threshold for passing a successful condition is lower
If there are no successes, apply default values

New function:

def evaluate(self):
    if self.initialized:
        right_hand_link_z = torch.concat(
            [
                link.pose.p[..., 2].unsqueeze(1)
                for link in self.right_hand_link + self.right_hand_tip_link
            ],
            dim=1,
        )

        fail_collision_table = (right_hand_link_z < self.table_height + 0.04).any(
            dim=1
        )
        fail_cube_fall = self.cube.pose.p[:, 2] < 1.0
        fail = fail_collision_table | fail_cube_fall
    else:
        fail = torch.zeros_like(self.cube.pose.p[:, 0], dtype=torch.bool)

    # Adjusted success condition for easier achievement
    success = self.cube.pose.p[:, 2] >= 1.0
    state = {"success": success, "fail": fail}

    # Ensure there's always at least one success or default value
    if not success.any():
        state["success"] = torch.zeros_like(success)
        state["fail"] = torch.ones_like(fail)

    return state

I was able to run the command python ppo.py --no-capture-video --env-id Bimanual_Allegro_Cube, but when I tried evaluating the final checkpoint, I received the following error:

(ms_dev) creativenick@creativenick:~/Desktop/SimToReal/bimanual-sapien$ python ppo.py --evaluate --no-capture-video --checkpoint runs/Bimanual_Allegro_Cube__ppo__1__1717740309/final_ckpt.pt
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/tyro/_fields.py:343: UserWarning: The field wandb_entity is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/tyro/_fields.py:343: UserWarning: The field checkpoint is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
Running evaluation
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_observation_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_observation_space` for environment variables or `env.get_wrapper_attr('single_observation_space')` that will search the reminding wrappers.
  logger.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_action_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_action_space` for environment variables or `env.get_wrapper_attr('single_action_space')` that will search the reminding wrappers.
  logger.warn(
/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.max_episode_steps to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.max_episode_steps` for environment variables or `env.get_wrapper_attr('max_episode_steps')` that will search the reminding wrappers.
  logger.warn(
####
args.num_iterations=390 args.num_envs=512 args.num_eval_envs=2
args.minibatch_size=800 args.batch_size=25600 args.update_epochs=4
####
Traceback (most recent call last):
  File "/home/creativenick/Desktop/SimToReal/bimanual-sapien/ppo.py", line 315, in <module>
    agent.load_state_dict(torch.load(args.checkpoint))
  File "/home/creativenick/anaconda3/envs/ms_dev/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Agent:
        size mismatch for actor_logstd: copying a param with shape torch.Size([1, 44]) from checkpoint, the shape in current model is torch.Size([1, 8]).
        size mismatch for critic.0.weight: copying a param with shape torch.Size([256, 98]) from checkpoint, the shape in current model is torch.Size([256, 42]).
        size mismatch for actor_mean.0.weight: copying a param with shape torch.Size([256, 98]) from checkpoint, the shape in current model is torch.Size([256, 42]).
        size mismatch for actor_mean.6.weight: copying a param with shape torch.Size([44, 256]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for actor_mean.6.bias: copying a param with shape torch.Size([44]) from checkpoint, the shape in current model is torch.Size([8]).

The evaluation success rate (eval_success_rate), evaluation fail rate (eval_fail_rate), fail rate (fail_rate), learning rate (learning_rate), and many other stats were all completely flat.

I also attached the folder for this run below: Bimanual_Allegro_Cubeppo1__1717740309.zip

CreativeNick / SimToReal