ermongroup / MA-AIRL

Multi-Agent Adversarial Inverse Reinforcement Learning, ICML 2019.
198 stars 27 forks source link

Discriminator has an argument state_only=True, which removes action from the input of the reward network. Is it OK? #6

Open pengzhenghao opened 1 year ago

pengzhenghao commented 1 year ago

Do you have any comment on which method is better? Removing action from the reward function: g(s, a) -> g(s) makes the meaning completely different. Is this a reasonable choice?