Farama-Foundation / Metaworld

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
https://metaworld.farama.org/
MIT License
1.28k stars 275 forks source link

Confuse about the success rate #202

Closed jp18813100494 closed 4 years ago

jp18813100494 commented 4 years ago

I am new to metaworld! Thank you for the configuration of such meaningful and complex projects. I have some questions about the metaworld. In the project setting, does the success rate represent the success in one step or one episode? Because in your project, the agent will reach the max path length in one episode, such as 150, so does the success rate represent the average success rate in 150 steps or the success rate in the last step? Can you give me some points about it.

jp18813100494 commented 4 years ago

Another question! In ml45/10, we set 45 train environments, what is the relationship between the class num and the task num? If the task num is also 45/10, which means every environment only has one task?

ryanjulian commented 4 years ago

@avnishn

avnishn commented 4 years ago

Hi @jp18813100494, thanks for checking out Metaworld!

Let me try to answer your questions in order:

1) > In the project setting, does the success rate represent the success in one step or one episode?

Let me start by saying that Success is equal to 1 if any step in an episode, the agent enters a success state of that environment. A successful episode would be one where any step in that episode is a successful step. For example in pick-place-v1 the agent is tasked with picking up a puck and placing it in a randomly placed 3d region that has a 7cm radius. Successful states in this environment would be defined by this region, and any way that the puck can end up in this region. If the puck enters this region during an episode, then the Success of that episode would be equal to 1, and 0 otherwise.

From there to compute the Success Rate of the agent on an environment, we collect a certain number of trajectories ~10-100, and observe how many of those are successful episodes. The Success Rate of the agent on that environment is then computed as number of successful episodes / total episodes.

avnishn commented 4 years ago

I'm not quite sure if I understand your second question, so let me know if this answer doesn't satisfy you:

In the benchmark ml45, which is made up of 45 different environments, there are more than 45 tasks: In a single Metaworld environment we define a single task of that environment to be 1 unique initialization of the object position and goal position. In the case of pick-place-v1, this would correspond to one random pair of (object_position, goal_position) where object_position is the starting position of the puck, and goal_position is the location of the center of the spherical region described above.

Depending on the benchmark that you are using, there is either 1 unique task per environment in that benchmark, or 50 unique tasks. For ML benchmarks such as ML1, ML10, and ML45, there are 50 unique tasks per environment in those benchmarks. For MT benchmarks such as MT1, MT10, and MT50, there is only 1 unique task per environment in those benchmarks.

I'm going to close this issue for now, but feel free to @ me (avnishn) and ask any more questions!

jp18813100494 commented 4 years ago

I'm not quite sure if I understand your second question, so let me know if this answer doesn't satisfy you:

In the benchmark ml45, which is made up of 45 different environments, there are more than 45 tasks: In a single Metaworld environment we define a single task of that environment to be 1 unique initialization of the object position and goal position. In the case of pick-place-v1, this would correspond to one random pair of (object_position, goal_position) where object_position is the starting position of the puck, and goal_position is the location of the center of the spherical region described above.

Depending on the benchmark that you are using, there is either 1 unique task per environment in that benchmark, or 50 unique tasks. For ML benchmarks such as ML1, ML10, and ML45, there are 50 unique tasks per environment in those benchmarks. For MT benchmarks such as MT1, MT10, and MT50, there is only 1 unique task per environment in those benchmarks.

I'm going to close this issue for now, but feel free to @ me (avnishn) and ask any more questions!

Thanks for your sincere reply! As you said, for ML benchmarks such as ML1, ML10, and ML45, there are 50 unique tasks per environment. What is the num of the training tasks in the ML45 setting? I am clear of your setting of metaworld, but I don't know the detail of setting in your algorithm, MAML, RL, PEARL. Can you tell me the total num of your train task and test task for 50 environments for ML45 in PEARL? 45,5 or 45 50,550?

avnishn commented 4 years ago

Hmm.

So the number of train tasks and test tasks for each benchmark doesn't change based on the algorithm.

Ml45 has 45 training environments and 5 testing environments. ML10 has 10 training environments and 5 test environments.

Each one of these environments has 50 unique task configurations. This would mean that in ML45. There are 4550 train tasks and 550 test tasks.

A meta learning algorithm that is being evaluated on ML10 or ml45 would train on the 10 and 45 environments, and each one of those environment's 50 unique tasks, and then would be evaluated on their meta adapt to the 5 test environments in ML10 and ML45. These test environments still have 50 unique tasks each, and each should be sampled during the many meta adapt test rollouts that will be completed throughout an algorithm's evaluation.

In terms of the code, changing an environment's task is as easy as calling the function set_task. For examples of how to use the function, refer to the examples in the readme.

A question one might have is: how often should set_task be called? While I don't have that information around from the original metaworld experiments (I am a maintainer of Metaworld, not one of the original authors) I can safely say that this is up to the implementor of the algorithm.

This may have been a bit more than you were looking for in an answer, but I hope that everything you are looking for is here!

CeyaoZhang commented 2 years ago

I'm not quite sure if I understand your second question, so let me know if this answer doesn't satisfy you:

In the benchmark ml45, which is made up of 45 different environments, there are more than 45 tasks: In a single Metaworld environment we define a single task of that environment to be 1 unique initialization of the object position and goal position. In the case of pick-place-v1, this would correspond to one random pair of (object_position, goal_position) where object_position is the starting position of the puck, and goal_position is the location of the center of the spherical region described above.

Depending on the benchmark that you are using, there is either 1 unique task per environment in that benchmark, or 50 unique tasks. For ML benchmarks such as ML1, ML10, and ML45, there are 50 unique tasks per environment in those benchmarks. For MT benchmarks such as MT1, MT10, and MT50, there is only 1 unique task per environment in those benchmarks.

I'm going to close this issue for now, but feel free to @ me (avnishn) and ask any more questions!

@avnishn Hi, I notice that in the paper (4.3 Evaluation Protocol), it says

Multi-Task 10, Multi-Task 50 (MT10, MT50): Learning one multi-task policy that generalizes to 50 tasks belonging to 10 and 50 training environments, for a total of 500, and 2,500 training tasks.

Therefore, it seems that in both MT10, MT50, we have 10 or 50 environments each with 50 tasks, which is contradicting to what you mention that only one unique task per environment.