Closed HeegerGao closed 4 years ago
Hello Chongkai,
Thank you for using MetaWorld. To be clear, ML1 is not a goal conditioned environment. Accessing the goal in ML1, ML10, and ML45 benchmarks is intentionally impossible, because the meta-learning algorithm must adapt to the hidden goal. Please do not publish results on ML1, ML10, or ML45 that use goal information, because it would be disingenuous. The goal is accessible as the last three dimensions of the MT10 and MT50 (and soon MT1: #165 ) benchmarks, but we also don't generally consider those to be goal conditioned, and thus do not returned the achieved_goal
.
Unfortunately we're unable to release the training code originally used for these benchmarks, as it comes from multiple different sources, and uses a version of MetaWorld that has never been published. We have been able to reproduce the multi-task and meta-RL results using the Garage RL library, so please consider that library if you are interested. Results after the recent redesign (and for V2 environments) are not currently available, but providing them is a priority in our group.
Hopefully that answer your question, so I'm going to close this issue, but feel free to re-open if you have further questions.
Best, K.R.
Thanks! It helps me to better understand the mechanism of meta learning.
Hi, I am using metaworld.ML1 for studying meta reinforcement learning. I find that in the SawyerReachEnvV2 environment, when I run
obs, reward, done, info = env.step(a)
, theinfo['goal']
is not the correct goal, and it seems fixed to[-0.1 0.8 0.2]
.I read your source code and I find that
env._state_goal
returns the true goal, and I think you should addself.goal = self._state_goal
in thestep
function of SawyerReachEnvV2. I am not sure if this should be done in other envs.By the way, I think for goal-conditioned envs, the
obs
returned fromstep
function should be like this: (which is the same with openai gym)And I am also quite interested in how you trained these benchmarks. Can you release the training code and some results for these environments?