Div99 / IQ-Learn

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation
https://div99.github.io/IQ-Learn/
Other
190 stars 31 forks source link

How to judge the convergence #3

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hello!

Thanks so much for sharing the code!

I am new at inverse reinforcement learning. Now I am trying to apply the code to a customized environment without knowing anything about the reward function. So are there any metrics that can be used to judge the convergence except for rewards?

Thanks ;).

Div99 commented 2 years ago

Hi, you can look at the training losses or some evaluation videos if you don't have any reward metrics. I can also be willing to help if you can share any specifics (on GitHub or email).

ghost commented 2 years ago

Thanks for your quick reply!

Currently I am able to run the code in my environment. The losses at one time step is shown below, I only know a couple of of them such as 'value_loss'. Can you please help me figure out what they represent for and how to use them for convergence?

{'v0': -9.3360013961792, 'softq_loss': 0.21358530223369598, 'value_loss': -0.32929036021232605, 'regularize_loss': 0.02841954678297043, 'total_loss': -0.08728551119565964, 'loss/actor': 8.292854309082031, 'actor_loss/target_entropy': -1, 'actor_loss/entropy': 0.4056042432785034, 'alpha_loss/loss': 0.27993878722190857, 'alpha_loss/value': 0.19909673026414973}

Div99 commented 2 years ago

You should look at the total_loss metric. It should go down and converge to some very small value.

ghost commented 2 years ago

Thanks again!

I just have one last question: when I am using test_iq.py, how should I get the losses for the testing set to judge whether the model has been "over-trained"?

Div99 commented 2 years ago

Hmm, maybe you can create a validation split, and keep track of the iq_learn loss or an MSE loss of the policy actions to measure overfitting

ghost commented 2 years ago

Thanks for answering my questions ;)!

I'll close this issue.

ghost commented 2 years ago

Hello!

I have to reopen this issue since I have spent a lot of time tuning those hyper-paramters. Can you please list some important ones or provide some suggestions?

Div99 commented 2 years ago

The most important ones are init_temp, critic_lr, policy_lr. Also recommend using the IQ loss with X^2 divergence and online replay sampling (method.chi=True, method.loss=value)

ghost commented 2 years ago

Hi!

I am wondering if the total_loss metric is actually the J(π, Q) of Eq. (10) in your IQ-Learn paper for a continuous problem.

Thanks.

azafar1991 commented 8 months ago

Hello, Were you able to get reward function ? I am trying to use it, without online training.