Open Cattharine opened 1 month ago
About Soft Actor Critic Discrete: https://arxiv.org/pdf/1910.07207
Can also be useful to see this implementation: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC_Discrete.py (This repository was mentioned in the article about SAC Discrete)
Our current baseline RL algorithm is DQN (more accurately it is DDQN). Named algorithm uses epsilon-greedy policies to at least have a chance of fully investigating environment in question. Using epsilon this way also makes us schedule change of this parameter during study and this task, schedule, can be non-trivial. So one can try turn to the soft RL problem allowing to deal with entropy of an output policy. One of the algorithms solving soft RL problem is Soft Actor Critic (SAC). As our environment has discrete action space we are interested in Discrete SAC. The hypothesis is that SAC Discrete can play not worse than the best result of DDQN