UoA-CARES / cares_reinforcement_learning

CARES Reinforcement Learning Package
11 stars 2 forks source link

Dev/update sac to the paper #112

Closed qiaoting159753 closed 11 months ago

qiaoting159753 commented 11 months ago
  1. Adjust the hyperparameters to the paper they reported at [https://arxiv.org/pdf/1812.05905.pdf] and [https://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf].
  2. A significant change in the Actor part has been introduced based on Appendix C at [https://arxiv.org/pdf/1812.05905.pdf]. A tanh transformer looks like it can boost an agent's performance. Empirically, it enhances rewards in most environments.
  3. The code was found at the [Pytorch Benchmark]: https://github.com/pytorch/benchmark/blob/7de2aeda4a8f62bd8d6777d9ce3f2962ccb6d1d1/torchbenchmark/models/soft_actor_critic/nets.py#L242 is a common practice. It is taught at Berkley's RL course cs225.