Branching DQN
Branching DQN implementation with pytorch based on https://github.com/seolhokim/BipedalWalker-BranchingDQN.
It is also sufficiently capable of showing (almost) optimal movements after 1000 episodes in BipedalWalker-v3 environment.
For better performance in BipedalWalker-v3, I use some tricks mentioned in https://zhuanlan.zhihu.com/p/409553262.
However it seems fine in other environments without these tricks. :)
Dependencies
python==3.9.10
gym==0.18.3
torch==1.13.1
Other versions may also work well. It's just a reference.
Structure
/data: contains results of training or testing, including graphs and videos
/model: contains pre-trained models
Train
use:
python train.py
- --round | -r : training rounds (default: 2000)
- --lr_rate | -l : learning rate (default: 0.0001)
- --batch_size | -b : batch size (default: 64)
- --gamma | -g : discounting factor gamma (default: 0.99)
- --action_scale | -a : discrete action scale among the continuous action space (default: 25)
- --env | -e : environment to train in (default: BipedalWalker-v3)
- --per | -p : use per
- --load | -l : to specify the model to load in ./model/ (e.g. 25 for [env]_25.pth)
- --no_trick | -nt : not to use tricks
- --save_interval | -s : interval round to save model(default: 1000)
- --print_interval | -d : interval round to print evaluation(default: 200)
Test
use:
python enjoy.py
- --not_render | -n : not to render
- --round | -r : evaluation rounds (default: 10)
- --action_scale | -a : discrete action scale, specifying network to load in ./model/ (default: 25)
- --env | -e : environment to test in (default: BipedalWalker-v3)
Performance
Scores in Training:
Trained Model: