Denys88 / rl_games

RL implementations
MIT License
863 stars 146 forks source link

Bounds / regularisation loss and action clipping #257

Closed johannespitz closed 12 months ago

johannespitz commented 1 year ago

Hi, In the continuous PPO implementation you have two types of regularization, that as far as I understand prevent weird effects due to the action clipping that is often necessary after the standard normal distribution: https://github.com/Denys88/rl_games/blob/fe95913f5b42dc39869da1924188c7601d3cf133/rl_games/algos_torch/a2c_continuous.py#L164

Are there any publications that introduce them, or do you have empirical data, or at least an intuition when which type of regularization works best?

I see (#153, #89) that you are not a fan of tanh squashing. Have you experimented with truncated normal distributions (https://en.wikipedia.org/wiki/Truncated_normal_distribution)?

Denys88 commented 12 months ago

Hi @johannespitz There are 2 general ideas here: 1) negative sum of squared actions is often added as a part of reward to make robot do more smoothed movements and reduce energy consumption. In a lot of cases something differentiable is better than non differentiable. 2) Depending on the env/robot some actions are moving towards -1 or 1 only. And if I dont do squashing actual values returned as mu can diverge a little bit which is bad in general. YOu can see this behavior in my IsaacGym fork, where I report action distribution to the tensorboard.

I think I tested all possible distributions for the IG and this one (return mu as is and logstd as independent from obs vector) was the best. I tested squashed normal, truncated normal, beta (it was unstable in a long run whatever I tried). But if you want to try something new it should be relatively easy: you can create your own ModelA2CContinuous class and test your ideas.

johannespitz commented 12 months ago

Thank you for the detailed answer! If you ever publish something or come across other work that studies the action clipping in detail I'd be very interested. But for now I guess I'll close the issue. Thanks again.