The loss becomes almost 0 after 1 epoch of training.

SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

https://arxiv.org/abs/2311.12908

Apache License 2.0

272 stars 24 forks source link

Open ChenDRAG opened 8 months ago

ChenDRAG commented 8 months ago

Hi, when reproducing your experiments, I found that experiments met a bug after 1 epoch of training.

Could you give any thoughts on why this might happen?

ShristiDasBiswas commented 6 months ago

were you able to solve this issue?

Mowenyii commented 6 months ago

Can updating the accelerator to the latest version resolve this issue?