Farama-Foundation / Metaworld

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
https://metaworld.farama.org/
MIT License
1.28k stars 274 forks source link

Expert Policy fails in Push-V2 and other tasks, particularly with newer versions of MetaWorld. #514

Closed Dongjiahua closed 1 month ago

Dongjiahua commented 1 month ago

I try to use expert policies to create demonstrations. However, I met some problems when starting with the docs/usage/basic_usage.md code. Specifically, the code works well when using the reach task, but it immediately fails when I switch to the push task. The following is my code when using the latest update of Metaworld

from metaworld import MT1
from metaworld.policies.sawyer_push_v2_policy import SawyerPushV2Policy as p_push

# from metaworld.policies.sawyer_peg_insertion_side_v2_policy import SawyerPegInsertionSideV2Policy as p
mt1 = MT1('push-v2', seed=42)
env = mt1.train_classes['push-v2']()
env.set_task(mt1.train_tasks[0])
obs, info = env.reset()

policy = p_push()

done = False
i = 0
while not done:
    a = policy.get_action(obs)
    obs, reward, done, _, info = env.step(a)
    print(f" Step: {i}, Reward: {reward}, Success: {info['success']}")
    done = int(info['success']) == 1
    i += 1

It will truncate finally (probably maximum length) and not success. a short snapshot of the output is:

Screen Shot 2024-10-22 at 4 06 05 PM

However, when I switch back to v0.1.0 (likely with mujoco_py and gym), it works correctly. the code is

from metaworld import MT1
from metaworld.policies.sawyer_push_v2_policy import SawyerPushV2Policy as p_push

mt1 = MT1('push-v2', seed=42)
env = mt1.train_classes['push-v2']()
env.set_task(mt1.train_tasks[0])
obs= env.reset()

policy = p_push()

done = False
i = 0
while not done:
    a = policy.get_action(obs)
    obs, reward, done, info = env.step(a)
    print(f" Step: {i}, Reward: {reward}, Success: {info['success']}")
    done = int(info['success']) == 1
    i += 1

The snapshot of the output is:

Screen Shot 2024-10-22 at 4 11 59 PM

Such a phenomenon also appears in most tasks like assembly and basketball. I don't quite understand why such failure cases occur in a newer version of Metaworld. Are there any solvents?

Kallinteris-Andreas commented 1 month ago

Can you test with different MuJoCo versions?

Dongjiahua commented 1 month ago

Thanks, and it works well! Previously, I didn't carefully follow the mujoco<3.0.0 requirements, which led to such a problem.

KurtDCD commented 1 week ago

Hi @Dongjiahua, could you elaborate on your solution to this problem? I have the same issue, but even when moving back to version 0.1.0 I can't have successful trajectories for reach, pick and place, or push tasks.

Thanks!

reginald-mclean commented 1 week ago

You shouldn't be using version 0.1.0 -- the most recent push in the master branch generates successful trajectories for those tasks.

KurtDCD commented 1 week ago

Hi Reginald, thanks for the quick reply! I wasn't, I installed the latest version and couldn't get the expert policies to work there, read here that they worked fine on 0.1.0 and gave that shot, but not luck either. Do you guys have any idea what could be wrong?

reginald-mclean commented 1 week ago

If you use the most up to date commit on the master branch it should work -- one of our build tests tests the scripted policies and the tests are currently passing.

KurtDCD commented 1 week ago

I'm so sorry, didn't realize I was importing ML1 instead of MT1. Any particular reason the scripted policies don't work for those ML1 tasks?

reginald-mclean commented 1 week ago

The scripted policies require the goal to be observable in the state, but if you use the ML1 version of an environment the goal is zeroed out. Thus the policy can't solve the task