facebookresearch / mtrl

Multi Task RL Baselines
MIT License
223 stars 28 forks source link

Inconsistent with the reproduced results of the paper #18

Open metaqiang opened 2 years ago

metaqiang commented 2 years ago

Description

This is what we reproduced: image

This is the result in the paper: image

We don't know why the results of Soft Modularization and Multi-headed SAC are not good.

How to reproduce

The following code is a command line instruction, as described in https://mtrl.readthedocs.io/en/latest/pages/tutorials/baseline.html.


cd Code/mtrl-main/ conda activate garage export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yyq/.mujoco/mujoco200/bin export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' mkdir -p ./trainlogs

mt10_mtsac

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd3.log 2>&1 &

mt10_mtmhsac

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd3.log 2>&1 &

mt10_soft_modularization

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd3.log 2>&1 &

System information

Thank you very much!

aithuuuus commented 2 years ago

You are using 2 million time steps and the paper is using 100k time steps, you should compare table1

metaqiang commented 2 years ago

You are using 2 million time steps and the paper is using 100k time steps, you should compare table1

The experimental results are still inconsistent with the table 1 in the paper: image

shagunsodhani commented 2 years ago

Hi! Let me see if I can understand the issue here. The results in the paper are with 10 seeds. As you can see, even with 10 seeds, the standard error is quite high. For reference, standard error = standard deviation / sqrt(num of seeds). I understand that you ran the experiments with 3 seeds and the standard error bands that you get are quite high as well (especially for multi-headed SAC). Could you please try running with more seeds? Increasing the experiment.num_eval_episodes may also help to get more stable results. Could you also share the metaworld version (git commit) that you are using.

metaqiang commented 2 years ago

Hello, my metaworld version is af8417bfc82a3e249b4b02156518d775f29eb289. Do different versions of metaworld greatly affect the experimental results?

shagunsodhani commented 2 years ago

That is the metaworld version that we tested against. I wanted to check the version as metaworld was under active development at that time

metaqiang commented 2 years ago

We are using the af8417bfc version as the environment, which is also used when we run the MTRL code :)