[Question] About the output layer of algorithms

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

MIT License

8.85k stars 1.68k forks source link

❓ Question

Hello, I am working on a project using SAC and PPO algorithms from stable-baselines3. I have connected my working environment to a gym environment, and for example, my action space is defined as self.action_space = spaces.Box(low=np.array([-2, 0]), high=np.array([2, 8]), dtype=np.float32) Do the algorithms I use adjust their output layer according to the defined action space? Specifically, what is used in the output layer of algorithms like SAC, PPO, and TD3?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

[Question] About the output layer of algorithms #1988

❓ Question

Checklist