This PR refactors the RL code for more flexibility. This was necessary for adding on-policy routines, which is also done with PPO along with all C and Fortran wrapper routines. So far, the code still needs to be tested but I wanted to get going on the PR.
This PR refactors the RL code for more flexibility. This was necessary for adding on-policy routines, which is also done with PPO along with all C and Fortran wrapper routines. So far, the code still needs to be tested but I wanted to get going on the PR.