Ahmed-Radwan094 commented 5 months ago

❓ Question

I implemented a custom environment in Carla (discuss and verified working in previous ticket) and trying to train PPO agent in it. I noticed that the policy gradient loss and explained variance are always very small, while the value loss can have very high peaks (maximum is around 200). The final agent performance is bad (almost random sampling). Can you maybe guide me what could be the reasons behind such behavior and how I can overcome it?

Hyperparameters used:

learning_rate: 0.0005
batch_size: 32
n_steps: 64
n_epochs: 8
gamma: 0.99
gae_lambda: 0.95
clip_range: 0.2
normalize_advantage: true
ent_coef: 0.01
vf_coef: 0.5
max_grad_norm: 0.5

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 5 months ago

If code there is, it is minimal and working

Closing because the minimum requirements for seeking help are not met.

This also look like tech support, which we don't do.

Ahmed-Radwan094 commented 5 months ago

Unfortunately, I cannot share the code. However, thank you for your support on other tickets.

DLR-RM / stable-baselines3

[Question] policy gradient loss and explained variance very small (almost zero) from the training start? #1897

❓ Question

Hyperparameters used:

Checklist