Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.09k stars 186 forks source link

Entropy Term for GAIL #9

Closed sandeepnRES closed 5 years ago

sandeepnRES commented 5 years ago

In the paper https://arxiv.org/pdf/1606.03476.pdf, (Jonathan Ho GAIL's paper), there is a Causal Entropy term for Policy network update step. I could not find any where in your code, anything related to that entropy. Did you skip the entropy term? Would it not completely change the objective function, the whole derivation depended on Maximum Causal Entropy IRL, which just became a kind of regularizer at the end.

I think if Causal Entropy term is included, then instead of maximum entropy IRL you are doing something else implicitly, which may be wrong/suboptimal? Correct me if I'm wrong.

Khrylx commented 5 years ago

Thank you for pointing out this. In my implementation, I just set the regularizer coefficient lambda to zero. The paper actually compared different lambda including zero. I might later try to add this term, though I find current performance is okay.

sandeepnRES commented 5 years ago

Ok, I just now noticed the paper's performance when coefficient is zero. Thanks for clearing this out.