Closed WJMacro closed 4 months ago
Hello! Thank you for your work; I have some technical issues I'd like to discuss with you. I noticed that you mentioned encountering a problem where the model repeatedly generated the same token after DPO training. We are experiencing a similar issue in our current experiments. Could you share how you resolved this issue?
And recently, I also try to mix dpo-loss and sft-loss, which may help to get a stable training reuslt
Hello! Thank you for your work; I have some technical issues I'd like to discuss with you. I noticed that you mentioned encountering a problem where the model repeatedly generated the same token after DPO training. We are experiencing a similar issue in our current experiments. Could you share how you resolved this issue?