SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"
https://arxiv.org/abs/2311.12908
Apache License 2.0
220 stars 22 forks source link

I have some questions. #7

Closed Mowenyii closed 4 months ago

Mowenyii commented 5 months ago

Thank you for great work~

I want to know how this loss updates the model, because if epis_theta is initialized by epis_ref, the loss will be 0. image

RockeyCoss commented 4 months ago

Since log(sigmoid(0.0)) != 0, The loss will not 0 in the first step.

yangluo23 commented 1 month ago

Since log(sigmoid(0.0)) != 0, The loss will not 0 in the first step.

Yes, it is. This original loss function (DPO in LLM task) is designed well as there exists sampling in decoding step, enough though the intput and ref/target model is the same, the output would not the same yet. But for this diffusion settings, it is not. Even though the loss is non-zero, intuitively, it is wired. Yes, I understand this formulation is mathmatically right.