Closed Mowenyii closed 4 months ago
Since log(sigmoid(0.0)) != 0, The loss will not 0 in the first step.
Since log(sigmoid(0.0)) != 0, The loss will not 0 in the first step.
Yes, it is. This original loss function (DPO in LLM task) is designed well as there exists sampling in decoding step, enough though the intput and ref/target model is the same, the output would not the same yet. But for this diffusion settings, it is not. Even though the loss is non-zero, intuitively, it is wired. Yes, I understand this formulation is mathmatically right.
Thank you for great work~
I want to know how this loss updates the model, because if epis_theta is initialized by epis_ref, the loss will be 0.