facebookresearch / mtrl

Multi Task RL Baselines
MIT License
223 stars 28 forks source link

Impossible to reproduce results of the paper #33

Open JosselinSomervilleRoberts opened 1 year ago

JosselinSomervilleRoberts commented 1 year ago

Problem

Like many other people (See #18 and #19), I cannot reproduce the results of the paper. I used the commit af8417bfc82a3e249b4b02156518d775f29eb289 of meta world and the same parameters as in the paper and the docs.

Here are the commands I ran:

Multi-task SAC

PYTHONPATH=. python3 -u main.py \
setup=metaworld \
env=metaworld-mt10 \
agent=state_sac \
experiment.num_eval_episodes=1 \
experiment.num_train_steps=2000000 \
setup.seed=1 \
replay_buffer.batch_size=1280 \
agent.multitask.num_envs=10 \
agent.multitask.should_use_disentangled_alpha=True \
agent.encoder.type_to_select=identity \
agent.multitask.should_use_multi_head_policy=False \
agent.multitask.actor_cfg.should_condition_model_on_task_info=False \
agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True \
agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True

CARE

PYTHONPATH=. python3 -u main.py \
setup=metaworld \
env=metaworld-mt10 \
agent=state_sac \
experiment.num_eval_episodes=1 \
experiment.num_train_steps=2000000 \
setup.seed=1 \
replay_buffer.batch_size=1280 \
agent.multitask.num_envs=10 \
agent.multitask.should_use_disentangled_alpha=True \
agent.multitask.should_use_task_encoder=True \
agent.encoder.type_to_select=moe \
agent.multitask.should_use_multi_head_policy=False \
agent.encoder.moe.task_id_to_encoder_id_cfg.mode=attention \
agent.encoder.moe.num_experts=4 \
agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=True

I originally thought that my results were very different because of the lack of seeds. So I ran 10 seeds for each like in the paper. It did not help. So I ran 20 seeds. Same problem. I used seeds from 1 to 20. Here is the training success:

SAC_vs_CARE

As you can see the two methods yields nearly the same results. The variance is also very high compared to the paper.

Am I doing something wrong? There are about 20 issues on this GitHub at the time I am writing this and this is the third one about reproducibility. Could you provide exact commands to recreate your results maybe?

Thanks

System information