eugenevinitsky / sequential_social_dilemma_games

Repo for reproduction of sequential social dilemmas
MIT License
380 stars 134 forks source link

i have som question #181

Open pyc33351 opened 3 years ago

pyc33351 commented 3 years ago

in run_scripts/train_baseline.py Hi, origin paper use A3C to train the agent, but I found in the above file that each agent will be assigned a PPO policy network, so which network will be trained?A3C or PPO?i first time use rllib and ray,i didn’t understand why To set up like this