Added the Soft Actor-Critic method from Haarnoja (2018) to ExaRL as an alternative to TD3 and DDPG. I have tested this on Pendulum and it worked very well. The version of this compatible with newer gym/tensorflow has also been tested on the Humanoid and Hopper from MuJoCo and showed behavior consistent with Haarnoja (2018), which indicates it works to some extent.
I would really love to see someone throw this at the 39-bus powergrid example and see if it works at all. Probably with the following flags:
--horizon 1 --actor_lr 0.0002 --critic_lr 0.0004 --sac_alpha 0.05
Oh shoot, I forgot to add that tensorflow probability is a requirement when using SAC. Specifically tensorflow_probability==0.16 for the version of tensorflow that was being used with this branch.
Added the Soft Actor-Critic method from Haarnoja (2018) to ExaRL as an alternative to TD3 and DDPG. I have tested this on Pendulum and it worked very well. The version of this compatible with newer gym/tensorflow has also been tested on the Humanoid and Hopper from MuJoCo and showed behavior consistent with Haarnoja (2018), which indicates it works to some extent.
I would really love to see someone throw this at the 39-bus powergrid example and see if it works at all. Probably with the following flags: --horizon 1 --actor_lr 0.0002 --critic_lr 0.0004 --sac_alpha 0.05