Closed dli1 closed 2 years ago
Hi @dli1, thanks for pointing this out!
Yes, I think you are right and it seems like a typo. I will try to correct this at least in the ArXiv version. It might be too late for the NAACL version to be corrected.
Great, now it's clear! Thanks for your confirmation.
Is it a typo of having the minus sign "-" in the MarginMSE loss function in Equation (1) in the GPL paper?
There should be no minus sign "-". Because the model should minimize the MSE(delta_teacher, delta_student), not maximize it. I checked the released code of GPL, the loss function is without the minus sign "-".