exalearn / EXARL

Scalable Framework for Reinforcement Learning
Other
10 stars 5 forks source link

Adding Soft-Actor Critic method to the fixBroken branch #265

Closed mikegros closed 11 months ago

mikegros commented 11 months ago

Added the Soft Actor-Critic method from Haarnoja (2018) to ExaRL as an alternative to TD3 and DDPG. I have tested this on Pendulum and it worked very well. The version of this compatible with newer gym/tensorflow has also been tested on the Humanoid and Hopper from MuJoCo and showed behavior consistent with Haarnoja (2018), which indicates it works to some extent.

I would really love to see someone throw this at the 39-bus powergrid example and see if it works at all. Probably with the following flags: --horizon 1 --actor_lr 0.0002 --critic_lr 0.0004 --sac_alpha 0.05

mikegros commented 11 months ago

Oh shoot, I forgot to add that tensorflow probability is a requirement when using SAC. Specifically tensorflow_probability==0.16 for the version of tensorflow that was being used with this branch.