Dev/update sac to the paper

Adjust the hyperparameters to the paper they reported at [https://arxiv.org/pdf/1812.05905.pdf] and [https://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf].
A significant change in the Actor part has been introduced based on Appendix C at [https://arxiv.org/pdf/1812.05905.pdf]. A tanh transformer looks like it can boost an agent's performance. Empirically, it enhances rewards in most environments.
The code was found at the [Pytorch Benchmark]: https://github.com/pytorch/benchmark/blob/7de2aeda4a8f62bd8d6777d9ce3f2962ccb6d1d1/torchbenchmark/models/soft_actor_critic/nets.py#L242 is a common practice. It is taught at Berkley's RL course cs225.

UoA-CARES / cares_reinforcement_learning