Training a PPO agent requires a lot of boilerplate code (network factories, training functions, inference functions, etc). We should write some utils to reduce this boilerplate and offer sane defaults.
It's probably worth asking if we want to re-implement PPO and/or if there are other implementations out there we can use. The Brax PPO is pretty kludgy + I agree it's hard to use/understand
Training a PPO agent requires a lot of boilerplate code (network factories, training functions, inference functions, etc). We should write some utils to reduce this boilerplate and offer sane defaults.