Checkpoint Loading Issues

btx0424 commented 3 months ago

Hi there. Thanks for building this seminal and exciting project.

I encountered some problems when trying to play with a trained policy. For example, after I have run execute.py for the task Open_laptop, the resulted directory is like:

Open_Laptop/
  task_Open_Laptop/
    RL_sac/
      yyyy-mm-dd-hh-mm-ss/open_the_laptop_screen/
        best_model/
        ...
        checkpoint_**/
        eval*/

Now that I want to play with the policy:

  ...
  policy, _ = load_policy("sac", env_name=task_name, policy_path=policy_path, env_config=env_config)

  env = make_env(env_config)
  obs = env.reset()
  done = False
  while not done:
    action = policy.compute_action(obs)
      obs, reward, done, info = env.step(action)
  env.disconnect()

and the question is what should policy_path be? I have tried .../best_model/, .../checkpoint_** and .../checkpoint_**/checkpoint-** and all of them throw no exceptions. However, the behavior of the loaded policy looks pretty random and is far away from that in the execute.gif produced during training. Is this expected?

Thanks in advance and looking forward to your response.

yufeiwang63 commented 3 months ago

Hi,

Thanks for raising this issue. I have to double check how to correctly load the policy since it has been a while. But can you first check this: I think for opening laptop, there will be two substeps, where the first is a motion planning substep that moves the gripper towards the laptop lib, and the second step is to use RL to open the laptop. Therefore, for the trained RL policy to work, the environment should be initialized to the state where the gripper is already attached the laptop surface, since this is the initial state where the RL policy is trained on. When building up the environment, are you setting last_restore_state_file to be the path of the state file that stores the last step of the motion planning substep?

btx0424 commented 3 months ago

Yes. I basically modified the reward step of execute.py to load from a policy path instead of training a new policy. So the policy execution starts from the last state of the motion planning step.

yufeiwang63 commented 2 months ago

Hi,

Sorry for the delay in the response. I have added a script for you to load a pretrained RL policy: https://github.com/Genesis-Embodied-AI/RoboGen/blob/main/run_policy.py. On my side it can correctly load a pretrained policy and reproduce the behavior stored in execute.py.

The checkpoint path should be .../best_model/checkpoint_**/checkpoint-**.

Let me know if you cannot reproduce the behavior or have any more issues.

Genesis-Embodied-AI / RoboGen

Checkpoint Loading Issues #26