Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.09k stars 186 forks source link

GAIL discriminator loss uses complete expert data in each iteration? #18

Closed SapanaChaudhary closed 4 years ago

SapanaChaudhary commented 4 years ago

https://github.com/Khrylx/PyTorch-RL/blob/f44b4444c9db5c1562c5d0bc04080c319ba9141a/gail/gail_gym.py#L126

The number of generator data samples seems to be around 2088, while the number of expert samples is 50000. Shouldn't the number of expert samples be the same as that of generator's?

Khrylx commented 4 years ago

The BCELoss we use for the discriminator will divide the loss by the number of samples. So it should be fine.

SapanaChaudhary commented 4 years ago

That is right. So, the whole of expert data is used in each iteration. Is that fair? If I were to sample the same number of data samples (from complete pool of expert data) as that of generator's, how would you suggest I sample (uniformly random)?

Khrylx commented 4 years ago

Yes, maybe randomly sample a batch of expert data.

SapanaChaudhary commented 4 years ago

Okay. Thank you.