Closed ghost closed 2 years ago
I'm glad to see someone working with IRL/imitation learning algorithms in environments that don't have reward functions (i.e. the realistic case).
Unfortunately judging whether you've learned a good solution here is hard. You can't really tell just from looking at summary metrics. At best you can see if the model is a good fit for the demonstration data. But that won't distinguish between it having learned something sensible, or just overfitted to the demonstration data.
Your best bet in practice might be to just take rollouts from different checkpoints and manually assess how sensible they look, that way you will be getting some fresh evaluations forming an effective test set.
You could also split the demos into a training and test set, and look at discriminator loss on the test set (you'd need to modify the code lightly to compute this).
To just judge convergence (but not necessarily convergence to anything good!), then ep_rew_mean
, policy loss, etc should all be stable at convergence.
Closing due to inactivity.
Hello!
Thanks so much for sharing the code!
I am new at inverse reinforcement learning. Now I am trying to apply AIRL and GAIL to a customized environment without knowing anything about the reward function. So are there any metrics that can be used to judge the convergence except for rewards?
Thanks ;).