DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

[Question] Discretize continuous actions/observations ? #1887

Closed nrigol closed 3 months ago

nrigol commented 3 months ago

❓ Question

I am running a PPO on a custom gymnasium environment where I define the actions in the following way :

self.action_space = spaces.Box(low=-1, high=1, shape=(3,), dtype=np.float32)

action[0], action [1] are continuous. I would like to have action[2] discrete and I split the domain [-1,1] in 5 equally spaced chunks.

because of action clipping my intuition is that the first and the last chunks are favored, is my intuition correct? Should I build a one hot encoder or something similar to prevent the issue?

Checklist

araffin commented 3 months ago

Duplicate of https://github.com/DLR-RM/stable-baselines3/issues/527 https://github.com/DLR-RM/stable-baselines3/issues/731 https://github.com/DLR-RM/stable-baselines3/issues/1482 https://github.com/DLR-RM/stable-baselines3/issues/1094