Closed ghost closed 2 years ago
Hi, you can look at the training losses or some evaluation videos if you don't have any reward metrics. I can also be willing to help if you can share any specifics (on GitHub or email).
Thanks for your quick reply!
Currently I am able to run the code in my environment. The losses at one time step is shown below, I only know a couple of of them such as 'value_loss'. Can you please help me figure out what they represent for and how to use them for convergence?
{'v0': -9.3360013961792, 'softq_loss': 0.21358530223369598, 'value_loss': -0.32929036021232605, 'regularize_loss': 0.02841954678297043, 'total_loss': -0.08728551119565964, 'loss/actor': 8.292854309082031, 'actor_loss/target_entropy': -1, 'actor_loss/entropy': 0.4056042432785034, 'alpha_loss/loss': 0.27993878722190857, 'alpha_loss/value': 0.19909673026414973}
You should look at the total_loss
metric. It should go down and converge to some very small value.
Thanks again!
I just have one last question: when I am using test_iq.py, how should I get the losses for the testing set to judge whether the model has been "over-trained"?
Hmm, maybe you can create a validation split, and keep track of the iq_learn loss or an MSE loss of the policy actions to measure overfitting
Thanks for answering my questions ;)!
I'll close this issue.
Hello!
I have to reopen this issue since I have spent a lot of time tuning those hyper-paramters. Can you please list some important ones or provide some suggestions?
The most important ones are init_temp, critic_lr, policy_lr. Also recommend using the IQ loss with X^2 divergence and online replay sampling (method.chi=True, method.loss=value)
Hi!
I am wondering if the total_loss metric is actually the J(π, Q) of Eq. (10) in your IQ-Learn paper for a continuous problem.
Thanks.
Hello, Were you able to get reward function ? I am trying to use it, without online training.
Hello!
Thanks so much for sharing the code!
I am new at inverse reinforcement learning. Now I am trying to apply the code to a customized environment without knowing anything about the reward function. So are there any metrics that can be used to judge the convergence except for rewards?
Thanks ;).