This pull request implements Async PPO (Proximal Policy Optimization) algorithm for Mujoco environment. The Async PPO algorithm leverage Cogment's microservices architecture to improve training efficiency and stability by running multiple agents in parallel.
Changes Made
Implemented APPO for continuous actions targeting mujoco env
Adapted sample producer to rollout data (done after N step rollout out)
Implemented buffer data to store the data that are then used to update the policy
Adjusted hyperparameters to optimize the training performance for Mujoco environment i.e., Hopper-v4
Added doc including overview APPO as well as set of hyperparameters to run hopper environment
Related Issue
close #177
Steps to Test
run the following command on the local machine python -m main +experiment=appo/hopper or
run notebook for Sagemaker in ./cloud/sagemaker_trainer.ipynb
Notes for Reviewers
@cloderic please have a look at the doc in ./docs/results/appo.md
Description
This pull request implements Async PPO (Proximal Policy Optimization) algorithm for Mujoco environment. The Async PPO algorithm leverage Cogment's microservices architecture to improve training efficiency and stability by running multiple agents in parallel.
Changes Made
Hopper-v4
Related Issue
close #177
Steps to Test
python -m main +experiment=appo/hopper
or./cloud/sagemaker_trainer.ipynb
Notes for Reviewers
./docs/results/appo.md