Add gym make support for Meta-World envs

Farama-Foundation / Metaworld

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning

https://metaworld.farama.org/

MIT License

1.28k stars 274 forks source link

Add gym make support for Meta-World envs #498

Open reginald-mclean opened 3 months ago

reginald-mclean commented 3 months ago

This PR is built on top of #499 to add gym.make support for the environments in Meta-World. They are organized as follows:

gym.make('Meta-World/env-name'): this creates a single environment with a single goal
gym.make('Meta-World/MT1-env-name'): This creates a version of the env-name environment with multiple goals, but no testing goals. This is typically used to test how an RL algorithm can acquire skills in a goal conditioned fashion
gym.make('Meta-World/ML1-train-env-name') or gym.make('Meta-World/ML1-test-env-name'): These commands make the training environment or testing environment for Meta-Learning problems with a single environment.

And there are also gym.make_vec commands that return multiple environments wrapped in a sync or async wrapper:

gym.make_vec('Meta-World/MT10-sync') or gym.make_vec('Meta-World/MT10-async'): This returns the MT10 set of environments either in sync or async mode.
gym.make_vec('Meta-World/ML10-train-sync') or gym.make_vec('Meta-World/MT10-train-async'): This returns the ML10 set of training environments either in sync or async mode.
gym.make_vec('Meta-World/ML10-test-sync') or gym.make_vec('Meta-World/ML10-test-async'): This returns the ML10 set of testing environments either in sync or async mode.
gym.make_vec('Meta-World/ML45-train-sync') or gym.make_vec('Meta-World/ML45-train-async'): This returns the ML10 set of training environments either in sync or async mode.
gym.make_vec('Meta-World/ML45-test-sync') or gym.make_vec('Meta-World/ML45-test-async'): This returns the ML10 set of testing environments either in sync or async mode.

rainx0r commented 3 months ago

Few things I noticed:

It's not entirely clear to me why _make_single_env() has ML1-related code when _make_single_ml() exists.
For MT envs, they should also have an AutoTerminateOnSuccess wrapper but it should be toggled to the initial state specified by the terminate_on_success flag as done here.
use_one_hot is set to False for all registered envs but it should be True for MT10 and MT50.
The way seed is added to the TaskSelect wrappers could probably cause some weird issues as it would reinitialise the global numpy rng state multiple times during env instantiation, and in general it's not necessary so it should probably be removed. The wrappers use the underlying env's np_random anyway so they don't really need to be seeded. Maybe the seed passed into init_each_env should just be used on the env directly:
```
env = env_cls()
if seed:
  env.seed(seed)
```

Also I think for simplicity it should be possible to just have a single definition of init_each_env that is used for both MT and ML that takes in env_cls, tasks, task_select_method, maybe seed, maybe max_episode_steps, maybe use_one_hot / env_id / num_tasks and has branching logic for OneHotWrapper and the task select method. MT envs just provide those one-hot parameters and use all tasks, while ML envs handle the task splitting a bit differently but they otherwise use the same wrappers and logic.

reginald-mclean commented 3 months ago

@rainx0r

It's not entirely clear to me why _make_single_env() has ML1-related code when _make_single_ml() exists.

Good catch, remnant of previous attempt at creating ML envs.

For MT envs, they should also have an AutoTerminateOnSuccess wrapper but it should be toggled to the initial state specified by the terminate_on_success flag as done

Will update

use_one_hot is set to False for all registered envs but it should be True for MT10 and MT50.

I don't know if I agree with this. We can include the wrapper for completeness and show examples of enabling it, but I think we shouldn't influence users to use the wrapper by default.

Maybe the seed passed into init_each_env should just be used on the env directly

Agreed

Also I think for simplicity it should be possible to just have a single definition of init_each_env

It would be possible but I think it might become a bit of a convoluted function to write/maintain. Unless there's a clean way of merging them, I think keeping them separate makes the most sense for maintenance reasons.

pseudo-rnd-thoughts commented 2 months ago

From experience, having tonnes of registered environment can make life easier but it can mean many more environments. For future proofing, an alternative approach is

We can keep, MetaWorld/env-name, this is good. Then for MT1, an alternative is gym.make("MetaWorld/MT1", env_name="env_name", mode="train/test"), this allows flexibility with env-name (to add more) and mode to easily specify if to train or test.

Similarly for MT10, we can use env_names=[...]

Is there any reason for 1 and 10 only, could we make it MultiTask with env_names that is flexible to any number?

I'm purely spitballing ideas, you don't need to take any of them

reginald-mclean commented 2 months ago

@pseudo-rnd-thoughts I actually implemented something along the lines of what you suggested but forgot to include it. In addition to the above gym.make_vec commands there is also:

gym.make_vec('Meta-World/custom-mt-envs-sync', envs_list=['env_name1-v3', 'env_name2-v3', ...]) # or custom-mt-envs-async
gym.make_vec('Meta-World/custom-ml-envs-sync', envs_list=['env_name1-v3', 'env_name2-v3', ...])# or custom-ml-envs-async

Both of the above commands gives the user control over the environments that they want to use in a multi-task or meta-RL setting, instead of the predefined ones.

I agree, it is a LOT of environments to add, but there are also lots of different use cases of MW environments. Some of them are single environments (ie 'Meta-World/reach-v3'), some of them are the smaller MT/ML environments (ie 'Meta-World/ML-train-reach-v3'), and some of them are the pre-defined environment sets (MT10, MT50, ML10, ML45).

pseudo-rnd-thoughts commented 2 months ago

@reginald-mclean In my opinion, I would work to keep the number of environments to a minimal. Personally, I would only have gym.make for the individual environments, i.e., gym.make("Meta-World/reach-v3") Alongside the generic MultiTask and MetaTask single and vector environments. Then finally have the original gym.make("MetaWorld/MT50") to MT10, MT50, ML10, ML45

This reduces the mess you need to maintain and provides more opinions to the users on what they do. If a user wants a custom multi-task setup with env x, y, z then they can make it without a close but no equivalent version existing within the 100s of environment that could be registered.

Environment parameters are your friend here, minimising what you need to maintain while adding flexibility to users

I say this as I remove over 800 environment from Atari, currently there are over 1000 environments registered for only 100 games. For ALE, there are 14 environments registered for each game which in my opinion is crazy and very few people actually use the extra / special registered environments which can be accessed through parameters.

reginald-mclean commented 3 weeks ago

@pseudo-rnd-thoughts ok I have thought about it and I definitely agree with your comment about having minimal environments. I've moved the environments into their base environments (ie MT1, MT10, MT50), and allowed for the various different features via arguments. I've had a few extra commits slip into this PR, just trying to figure out how to remove them