HKUNLP / reparam-discrete-diffusion

Reparameterized Discrete Diffusion Models for Text Generation
Apache License 2.0
90 stars 2 forks source link

label smoothing mistake #1

Open youngsheen opened 1 year ago

youngsheen commented 1 year ago

when compute label smoothing loss, the logit_loss only multiply weight_t while miss the 1/(t+1).

LZhengisme commented 1 year ago

Hi, thanks for your interest!

Technically, both 1/(t+1) and weight_t are only associated with the diffusion ELBO objective but not the label smoothing loss. Therefore, it is reasonable to use arbitrary weighting for the label smoothing loss (which is often used as an auxiliary objective for regularization) to scale its effect; we conducted various ablations in our preliminary experiments and found that only multiplying label smoothing loss with weight_t yields the best performance for translation tasks.

However, it could be true that this choice may not be optimal in all cases and that carefully tuning the weighting in a task-specific manner may lead to better performance.

Hope this clears things up xD