Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.11k stars 189 forks source link

Fail to train of GAIL in Ant-v2 environment #28

Open seolhokim opened 3 years ago

seolhokim commented 3 years ago

I trained your ppo first.

python examples/ppo_gym.py --env-name Ant-v2 --save-model-interval 100

After 500 episodes, I made trajectories.

python gail/save_expert_traj.py --model-path assets/learned_models/Ant-v2_ppo.p

Last, I ran gail.

python gail/gail_gym.py --env-name Ant-v2 --expert-traj-path assets/expert_traj/Ant-v2_expert_traj.p

I implemented Gail and Vail, but I failed to train it too.(but hopper worked well)

Any Ideas?

dibbla commented 2 years ago

Hi! Have you work it out? It seems to be the problem of zfilter.

seolhokim commented 2 years ago

No. I guess you doubt the training has failed because of standardization by zfilter, right? I checked but it was not the key to solving problem in my implementation.

lviano commented 2 years ago

Any solution to this ? I am running in the same problem. The code works well for all the other MuJoCo I have tried but not for HalfCheetah