Wrong goal state in SawyerReachEnvV2

HeegerGao commented 4 years ago

Hi, I am using metaworld.ML1 for studying meta reinforcement learning. I find that in the SawyerReachEnvV2 environment, when I run obs, reward, done, info = env.step(a), the info['goal'] is not the correct goal, and it seems fixed to [-0.1 0.8 0.2].

I read your source code and I find that env._state_goal returns the true goal, and I think you should add self.goal = self._state_goal in the step function of SawyerReachEnvV2. I am not sure if this should be done in other envs.

By the way, I think for goal-conditioned envs, the obs returned from step function should be like this: (which is the same with openai gym)

{
'observation': robot joint state, end_effector pos, obj_pos, vel……,
'deseired_goal': deseired_goal,
'achieved_goal': achieved_goal
}

And I am also quite interested in how you trained these benchmarks. Can you release the training code and some results for these environments?

krzentner commented 4 years ago

Hello Chongkai,

Thank you for using MetaWorld. To be clear, ML1 is not a goal conditioned environment. Accessing the goal in ML1, ML10, and ML45 benchmarks is intentionally impossible, because the meta-learning algorithm must adapt to the hidden goal. Please do not publish results on ML1, ML10, or ML45 that use goal information, because it would be disingenuous. The goal is accessible as the last three dimensions of the MT10 and MT50 (and soon MT1: #165 ) benchmarks, but we also don't generally consider those to be goal conditioned, and thus do not returned the achieved_goal.

Unfortunately we're unable to release the training code originally used for these benchmarks, as it comes from multiple different sources, and uses a version of MetaWorld that has never been published. We have been able to reproduce the multi-task and meta-RL results using the Garage RL library, so please consider that library if you are interested. Results after the recent redesign (and for V2 environments) are not currently available, but providing them is a priority in our group.

Hopefully that answer your question, so I'm going to close this issue, but feel free to re-open if you have further questions.

Best, K.R.

HeegerGao commented 4 years ago

Thanks! It helps me to better understand the mechanism of meta learning.

Farama-Foundation / Metaworld

Wrong goal state in SawyerReachEnvV2 #168