exalearn / EXARL

Scalable Framework for Reinforcement Learning
Other
10 stars 5 forks source link

Adding Soft-Actor Critic method to the version of fixBroken that's been updated for newer gym/tensorflow #266

Closed mikegros closed 11 months ago

mikegros commented 11 months ago

Added the Soft Actor-Critic method from Haarnoja (2018) to ExaRL as an alternative to TD3 and DDPG. I have tested this on Pendulum and it worked very well. I also tested on the Humanoid and Hopper from MuJoCo and showed behavior consistent with Haarnoja (2018), which indicates it works to some extent.

There are two versions of the SAC agent, v1 is the version from Haarnoja (2018), v0 uses a truncated normal sampling distribution to handle action space bounds rather than using a tanh to squash samples from an unbounded normal as done in Haarnoja. Both showed similar performance and behavior so I'm keeping both available.

mikegros commented 11 months ago

I forgot to add that tensorflow probability is a requirement when using SAC.

I tested with tensorflow-probability==0.22.1 and tensorflow==2.14.0 for this branch.