Add new RL algorithm to compare with baseline (SAC Discrete)

Cattharine commented 1 month ago

Our current baseline RL algorithm is DQN (more accurately it is DDQN). Named algorithm uses epsilon-greedy policies to at least have a chance of fully investigating environment in question. Using epsilon this way also makes us schedule change of this parameter during study and this task, schedule, can be non-trivial. So one can try turn to the soft RL problem allowing to deal with entropy of an output policy. One of the algorithms solving soft RL problem is Soft Actor Critic (SAC). As our environment has discrete action space we are interested in Discrete SAC. The hypothesis is that SAC Discrete can play not worse than the best result of DDQN

Cattharine commented 1 month ago

About Soft Actor Critic Discrete: https://arxiv.org/pdf/1910.07207

Cattharine commented 1 month ago

Can also be useful to see this implementation: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC_Discrete.py (This repository was mentioned in the article about SAC Discrete)

Cattharine / product_owner_rl

Add new RL algorithm to compare with baseline (SAC Discrete) #63