Safe MARL in Autonomous Driving
This is a pytorch implementation of Constrained Stackelberg Q-learning(discrete action) and Constrained Stackelberg MADDPG(continuous action). These algorithms are proposed by incorporating the Stackelberg model into Deep Q-learning and MADDPG, and leveraging the Lagrangian multiplier method to deal with the safety constraints. The highway environments used in our experiments are modified from highway-env.
1. Installation
# create conda environment
conda create -n env_name python==3.9
conda activate env_name
pip install -r requirements.txt
2. Quick Start
- create experiment folder, for example, ./merge_env_result/exp2
- define train config in ./merge_env_result/exp2/config.py
- define env config in ./merge_env_result/exp2/env_config.py
- start training by running the following command
- new highway environment not supported yet due to version conflict
python main_bilevel.py --file-path ./merge_env_result/exp2
3. Demos
3.1 Safe Highway environment
3.2 Safe Merge environment
3.3 Safe Roundabout environment
3.4 Safe Intersection environment
3.5 Safe Racetrack environment
4. Results
4.1 Safe Highway Environment
Reward and Training curve |
|
4.2 Safe Merge Environment
Leader reward |
Follower reward |
Total reward |
|
|
|
Training curve |
|
4.3 Safe Roundabout Environment
Leader reward |
Follower reward |
Total reward |
|
|
|
Training curve |
|
4.4 Safe Intersection Environment
Leader reward |
Follower reward |
Total reward |
|
|
|
Training curve |
|
4.5 Safe Racetrack Environment
Leader reward |
Follower reward |
Total reward |
|
|
|
Training curve |
|
Citation
If you find the repository useful, please cite the study
@article{zheng2024safe,
title={Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving},
author={Zheng, Zhi and Gu, Shangding},
journal={arXiv preprint arXiv:2405.18209},
year={2024}
}