Async PPO for Mujoco - Githubissues

Description

This pull request implements Async PPO (Proximal Policy Optimization) algorithm for Mujoco environment. The Async PPO algorithm leverage Cogment's microservices architecture to improve training efficiency and stability by running multiple agents in parallel.

Changes Made

Implemented APPO for continuous actions targeting mujoco env
Adapted sample producer to rollout data (done after N step rollout out)
Implemented buffer data to store the data that are then used to update the policy
Adjusted hyperparameters to optimize the training performance for Mujoco environment i.e., Hopper-v4
Added doc including overview APPO as well as set of hyperparameters to run hopper environment

Related Issue

close #177

Steps to Test

run the following command on the local machine python -m main +experiment=appo/hopper or
run notebook for Sagemaker in ./cloud/sagemaker_trainer.ipynb

Notes for Reviewers

@cloderic please have a look at the doc in ./docs/results/appo.md

cogment / cogment-verse

Async PPO for Mujoco #181

Description

Changes Made

Related Issue

Steps to Test

Notes for Reviewers