Hello. Thank you for your hard-working!
In your paper page 5, you stated that
'The two loss formulations are not, however, equivalent. Because log is a concave function, Jensen’s Inequality [23] implies that L_in ≤ L_out. One would thus expect L_out to be in the superior supervised loss function'
this might be a silly question, but I am wondering that why L_in ≤ L_out indicates loss_out is superior loss function ? In your paper page 6, you showed loss_out is more stable for traning and I understood it. But I can't connect this idea with being the superior loss function.
Hello. Thank you for your hard-working! In your paper page 5, you stated that 'The two loss formulations are not, however, equivalent. Because log is a concave function, Jensen’s Inequality [23] implies that L_in ≤ L_out. One would thus expect L_out to be in the superior supervised loss function' this might be a silly question, but I am wondering that why L_in ≤ L_out indicates loss_out is superior loss function ? In your paper page 6, you showed loss_out is more stable for traning and I understood it. But I can't connect this idea with being the superior loss function.