--scenario
: defines which environment in the MPE is to be used (default: "cn"
)
--max-episode-len
maximum length of each episode for the environment (default: 25
)
--num-episodes
total number of training episodes (default: 60000
)
--num-adversaries
: number of adversaries in the environment (default: 0
)
--lr
: learning rate (default: 1e-2
)
--gamma
: discount factor (default: 0.95
)
--batch-size
: batch size (default: 800
)
--num-units
: number of units in the MLP (default: 128
)
--prior-buffer-size
: prior network training buffer size
--prior-num-iter
: prior network training iterations
--prior-training-rate
: prior network training rate
--prior-training-percentile
: control threshold for KL value to get labels
--exp-name
: name of the experiment, used as the file name to save all results (default: None
)
--save-dir
: directory where intermediate training results and model will be saved (default: "/tmp/policy/"
)
--save-rate
: model is saved every time this number of episodes has been completed (default: 1000
)
--load-dir
: directory where training state and model are loaded from (default: ""
)
--plots-dir
: directory where training curves are saved (default: "./learning_curves/"
)
--restore_all
: whether to restore existing I2C network
I2C be learned end-to-end or in a two-phase manner. This code is implemented for end-to-end manner which could take more training time compared with the latter manner
For Cooperative Navigation,
python3 train.py --scenario 'cn' --prior-training-percentile 60 --lr 1e-2
For Predator Prey,
python3 train.py --scenario 'pp' --prior-training-percentile 40 --lr 1e-3
If you are using the codes, please cite our paper.
@inproceedings{ding2020learning,
title={Learning Individually Inferred Communication for Multi-Agent Cooperation},
author={Ding, Ziluo and Huang, Tiejun and Lu, Zongqing},
booktitle={NeurIPS},
year={2020}
}
This code is developed based on the source code of MADDPG by Ryan Lowe