Closed Lplenka closed 2 years ago
Yep implementing Recovery RL for discrete actions spaces should work fine. Since you can use any off policy RL algorithm for the recovery policy, you can replace the default SAC recovery policy with an adapted version for discrete actions as shown here: https://github.com/ku2482/sac-discrete.pytorch. You can also use a DQN agent for the recovery policy. The model-based recovery policy can be adapted to discrete actions in a similarly straightforward manner, as you can learn a model over the discrete actions and then use similar shooting-based planning techniques to plan actions.
Thank you for your reply @abalakrishna123. I will try it in an environment with discrete action space. Sorry to bother you, but I have two more questions:
1) Would the Recovery RL technique also work with on-policy algorithms like PPO?
2) The model-based recovery policy learns the model dynamics gradually over the episodes? Is this correct understanding, or does it require prior information about the environment and agent dynamics?
Thanks in advance. These questions would help me in my research.
Thanks for the explanation. I will try to implement this.
Hello @abalakrishna123 @bthananjeyan
Thanks for sharing this repo. I read your paper Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones. The idea of using separate recovery and task policies is fascinating.
As far as I understood from the paper, you have performed experiments having a continuous action space. What are your thoughts on using this strategy for environments with discrete action space? Do you think it's possible to implement the same?