I wanted to ask if you think it would be feasible to adapt this implementation of PPO using stable_baselines to DeepMimic. If so, which are the main differences in the learning algorithm?
By reading both papers, it didn't look like there were major differences in the PPO implementation, apart from the reward functions.
Is there any specific reason why you decided to use PPO1 instead of PPO2?
Hello,
I wanted to ask if you think it would be feasible to adapt this implementation of PPO using stable_baselines to DeepMimic. If so, which are the main differences in the learning algorithm?
By reading both papers, it didn't look like there were major differences in the PPO implementation, apart from the reward functions.
Is there any specific reason why you decided to use PPO1 instead of PPO2?