DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

[Question] How to avoid SAC to stuck in local minima #1903

Closed JaimeParker closed 2 months ago

JaimeParker commented 2 months ago

❓ Question

I've read 76, and it's a little different.

if during random exploration it finds the goal, then it will work, otherwise, it will be stuck in a local minima

But for me, through the rollout output I can tell that it did find the goal, which means finishing my job, not in an optimal way but accomplished it anyway, which is local minima.

Here is a training curve using SAC:

It found the goal at about 10m, and the reward at that time was about -100. During the last steps, although it had a weak increasing trend, it didn't reach the optimal reward I designed in the end.

In case of lacking total timesteps, I tried larger total timesteps several times (3e8 usually). The reward curves are basically all like below:

always converge quickly and stuck in local minima.

I've tried several methods,

The first method did slow down training, but seemed to have no influence on the final local minima. The second method did improve local minima, but this phenomenon is random and does not necessarily improve every time.

Details:

by adding additional noise to the actions of the behavior policy

I'll give it a go. But the difference is that I can get out of the local minima occasionally.

There is another choice that I wanna try, adding a quadratic term of punishment of time (Since my task requires minimum time), which is:

# in reward function
time_quadratic = -k * current_step  # k > 0

Do you have any experience of getting out of local minima for SAC, thanks.

I also tried PPO, it performed well.

Checklist

JaimeParker commented 2 months ago

since this is a tech support issue, closed.