DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.85k stars 1.68k forks source link

[Question] About the output layer of algorithms #1988

Closed abdulkadrtr closed 1 month ago

abdulkadrtr commented 1 month ago

❓ Question

Hello, I am working on a project using SAC and PPO algorithms from stable-baselines3. I have connected my working environment to a gym environment, and for example, my action space is defined as self.action_space = spaces.Box(low=np.array([-2, 0]), high=np.array([2, 8]), dtype=np.float32) Do the algorithms I use adjust their output layer according to the defined action space? Specifically, what is used in the output layer of algorithms like SAC, PPO, and TD3?

Checklist

araffin commented 1 month ago

https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment and many related/duplicate issues. In general, I would recommend you to read more about PPO/SAC (we have link to resource in our doc).