[Question] Discretize continuous actions/observations ?

❓ Question

I am running a PPO on a custom gymnasium environment where I define the actions in the following way :

self.action_space = spaces.Box(low=-1, high=1, shape=(3,), dtype=np.float32)

action[0], action [1] are continuous. I would like to have action[2] discrete and I split the domain [-1,1] in 5 equally spaced chunks.

because of action clipping my intuition is that the first and the last chunks are favored, is my intuition correct? Should I build a one hot encoder or something similar to prevent the issue?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

[Question] Discretize continuous actions/observations ? #1887

❓ Question

Checklist