in run_scripts/train_baseline.py
Hi, origin paper use A3C to train the agent, but I found in the above file that each agent will be assigned a PPO policy network, so which network will be trained?A3C or PPO?i first time use rllib and ray,i didn’t understand why To set up like this
in run_scripts/train_baseline.py Hi, origin paper use A3C to train the agent, but I found in the above file that each agent will be assigned a PPO policy network, so which network will be trained?A3C or PPO?i first time use rllib and ray,i didn’t understand why To set up like this