Why does GAIL get lower rewards the more it is trained?

Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

MIT License

1.09k stars 186 forks source link

Why does GAIL get lower rewards the more it is trained? #36

Open ZXAXKL opened 1 year ago

ZXAXKL commented 1 year ago

Hi, thank you for the baseline code, it helps me a lot. But I have a little problem with running it. I first sample data through the trained expert strategy, and then provide it to GAIL, but in the environments of Ant-v2 and Hopper-v2, the rewards will get lower and lower as the number of training increases. My environment is mujoco.py=2.0.8 and mujoco200. I would be very grateful if you could take the time to look into the problem for me. 16571687510554_ pic 16401687509779_ pic

Yangning-k commented 1 year ago

Hello, I have also encountered this problem, may I ask if you have solved it ? Thank u.