DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.85k stars 1.68k forks source link

[Question] policy gradient loss and explained variance very small (almost zero) from the training start? #1897

Closed Ahmed-Radwan094 closed 5 months ago

Ahmed-Radwan094 commented 5 months ago

❓ Question

I implemented a custom environment in Carla (discuss and verified working in previous ticket) and trying to train PPO agent in it. I noticed that the policy gradient loss and explained variance are always very small, while the value loss can have very high peaks (maximum is around 200). The final agent performance is bad (almost random sampling). Can you maybe guide me what could be the reasons behind such behavior and how I can overcome it?

Hyperparameters used:

learning_rate: 0.0005
batch_size: 32
n_steps: 64
n_epochs: 8
gamma: 0.99
gae_lambda: 0.95
clip_range: 0.2
normalize_advantage: true
ent_coef: 0.01
vf_coef: 0.5
max_grad_norm: 0.5

Checklist

araffin commented 5 months ago

If code there is, it is minimal and working

Closing because the minimum requirements for seeking help are not met.

This also look like tech support, which we don't do.

Ahmed-Radwan094 commented 5 months ago

Unfortunately, I cannot share the code. However, thank you for your support on other tickets.