Closed michaelzhiluo closed 4 years ago
I am having a similar problem. If I run these lines following the readme: ` from metaworld.benchmarks import ML1
env = ML1.get_train_tasks('pick-place-v1') # Create an environment with task pick_place
tasks = env.sample_tasks(1) # Sample a task (in this case, a goal variation)
env.set_task(tasks[0]) # Set task
obs = env.reset() # Reset environment a = env.action_space.sample() # Sample an action obs, reward, done, info = env.step(a) # Step the environoment with the sampled random action print('Goal in info: {}'.format(info['goal']))
obs = env.reset() # Reset environment a = env.action_space.sample() # Sample an action obs, reward, done, info = env.step(a) # Step the environoment with the sampled random action print('Goal in info: {}'.format(info['goal'])) ` the output is: Goal in info: [0.08237526 0.81740662 0.21224671] Goal in info: [0.02779365 0.80701352 0.05445151]
According to the paper, each task in ML1 should have a single goal.
@ryanjulian Is there a way to have each task represent a single goal while varying initial states?
@ryanjulian @tianheyu927 For MAML experiments, the paper says the number of rollouts per tasks is 10 for ML1. Did all the envs in a single task have self.random_init
set to False, which means that all episodes have the same starting state and goal?
@michaelzhiluo @varun-intel Thank you for your questions. I apologize for the delay. The team was out most of last week for the US Thankgiving holiday. Work-life balance is a core value in this org, and you should generally not expect responses over major holidays.
Before we answer, I should note that the public API surface of metaworld is limited to the metaworld.benchmarks
module. Everything else, especially the internal environment APIs, should be considered private and extremely unstable. Limiting the API in this way is one of the compromises we had to make to deliver this benchmark to the community.
I'll let @tianheyu927 and @zhanpenghe comment on the logic used for goals and random_init
.
@ryanjulian @tianheyu927 @zhanpenghe Thank you for fixing part of the code for self.random_init
! We have one more question, as we want to generate similar MAML results for ML1 reported in the appendix of the paper.
The paper states that each task in ML1 is one of the 50 random initial object and goal position. We are wondering on the exact details of what entails a task in a MAML setup.
Suppose there are 20 workers and 10 environments/worker (vectorized setup). In MAML, each worker represents a different task. For the Metaworld implementation in the paper, does each environment, given the same worker, have the same goal and initial object position? Or...is it that each task is a collection of randomized obj and goal positions, meaning that environments per worker have different goal and initial object positions?
Our reason for asking this question is because self.random_init
determines the former and latter case respectively.
@ryanjulian @tianheyu927 @zhanpenghe Just to follow up... we are wondering on how a task was defined for the experiments ran for ML1. Specifically, we are interested to see how each task is defined across vectorized environments:
Your response is greatly appreciated!
Initial conditions are not randomized in ML1. The important snippet is here: https://github.com/rlworkgroup/metaworld/blob/dfdbc7cf495678ee96b360d1e6e199acc141b36c/metaworld/benchmarks/ml1.py#L22, which sets the constructor arg random_init
to False
for all environments in the ML1 benchmark.
ML1 varys the goal position, but the diversity your meta-learner is exposed to during meta-training is controlled (it only gets to see 50 unique goals).
Though ML1 measures performance on intra-task variation and ML10/ML45 measure meta-learning performance on inter-task variation, the interfaces are the same. The set_task
interface is designed to allow for efficient vectorized sampling: if you want to transmit a meta-batch to remote or vectorized samplers, you can construct the environments once and only transmit the task information to each environment.
You can see this in the ML1 example in the README, when we call
# out loop, meta-batch sampling
env = ML1.get_train_tasks('pick-place-v1')
# sample a meta-batch
tasks = env.sample_tasks(1)
# configure a single environment to represent a single element of the meta-batch
env.set_task(tasks[0])
# inner-loop, single-task sampling
obs = env.reset()
a = env.action_space.sample()
obs, reward, done, info = env.step(a) # Step the environoment with the sampled random action
Psuedocode for a naive parallelization of meta-batch sampling might look something like this. My example assumes your meta-batch size and your parallelization height (number of environments) is the same.
# setup
env = ML1.get_train_tasks()
meta_batch_size = 10
envs = [pickle.loads(pickle.dumps(env)) for i in range(meta_batch_size)]
for i in range(num_meta_itrs):
# outer loop, meta-batch sampling
tasks = env.sample_tasks(meta_batch_size)
for e, t in zip(envs, tasks): # parallel-for
e.set_task(t)
# inner loop, single-task sampling
for e in envs: # parallel-for
path_length = 0
obs = e.reset()
while not done and path_length <= max_path_length:
a = policy.sample(obs)
obs, reward, done, info = env.step(a)
Per-step vectorization would be similar, but there's a lot more bookkeeping to deal with different vectors terminating at different steps. Meta-test evaluations of ML1 look similar, but with get_test_tasks
instead.
ML1 selects a new set of train/test goals each time you construct it by calling ML1.get_train_tasks()
or ML1.get_test_tasks()
. This can present a problem for multi-process or multi-machine sampling, in which many workers might construct ML1 instances in separate processes, giving them different sets of train/test goals.
Doing this wrong could accidentally expose your meta-learner to far more meta-train configurations than the benchmark allows. I agree that we should probably rethink to API to make this harder to mess up. I recommend that you configure your sampler to stuff the value of task
into env_info
, and then verify in your optimization process that your samples don't come from more than 50 unique tasks.
If you're going to use remote (multi-process or multi-machine) sampling for ML1, you have two options:
ML1.sample_tasks()
) from a single process and transmit the tasks to your workers, which then call env.set_task()
.ML1.get_train_tasks()
) on a main process and transmit it to worker processes by pickling/unpickling. This preserves the set of train tasks used across machines, which are otherwise chosen-anew every time the benchmark is constructed. The same logic applies to ML1.get_test_tasks()
. You may then sample tasks locally, because each worker is sampling among the same pre-chosen set anyway.Edit: I realized solution (2) doesn't work with our current pickling implementation (which reconstructs the object anew during unpickling)
@ahtsan @krzentner @naeioi @yonghyuc @lywong92 @CatherineSue @avnishn
See https://github.com/rlworkgroup/metaworld/pull/40 which will make case (2) on https://github.com/rlworkgroup/metaworld/issues/24#issuecomment-576996005 work
Thank for the clarification for ML1! We are able to reproduce similar results for reach-v1
with MAML. Making all workers (or tasks) and worker's environments to share 50 tasks in total significantly improved reward.
Lastly, just fyi, a small fix: args_kwargs[task_name]['kwargs']['random_init'] = False
in ml1.py. That ensures that initial state and goal position are constant upon calling env.reset()
.
@michaelzhiluo if you found a flaw, please open a PR so that everyone can share in your fix :)
@michaelzhiluo, I believe that we've fixed this in our most recent update! We've updated the Metaworld API, so for any future projects please make sure to use this new API and update any ongoing projects :)
Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with
env.reset()
meaning that initial positions change but goal stays constant)However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling
self.set_task
will updateself.goal
. However, when the environment is reset,self._state_goal
is initiallyself.goal
but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. Whenself.random_init
is False, it works as intended but the starting states are constant.We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when
env.reset()
is called.