Thanks for your amazing work!
I have two question about hyperparameter in the experiment setting.
1.
In paper 4.3, the initial lr is 7e4. Is that a typo? If not ,I'm really confuse why the lr is so large.
2.
The κ there is on the order of e4, which would lead the about 1, since C(xi) may on the order of e-1. I wonder why κ is set to be so big. If wi is all around 1.0, it looks like "Loss weighting" is almost the same as the normal method. Am I right or I've just missed something?
Thank you in advance for your reply! BTW, I'm one of your fans in Bilibili. Your explanation of the paper was very clear and helpful to me. Thank you for your great work!
Thanks for your amazing work! I have two question about hyperparameter in the experiment setting.
1.
In paper 4.3, the initial lr is 7e4. Is that a typo? If not ,I'm really confuse why the lr is so large.
2.
The κ there is on the order of e4, which would lead the about 1, since C(xi) may on the order of e-1. I wonder why κ is set to be so big. If wi is all around 1.0, it looks like "Loss weighting" is almost the same as the normal method. Am I right or I've just missed something?
Thank you in advance for your reply! BTW, I'm one of your fans in Bilibili. Your explanation of the paper was very clear and helpful to me. Thank you for your great work!