Closed univa-JASON closed 3 weeks ago
We write the training code based on the RLAIF-V project. The code implement a trainer for DPO by itself.
thank you for your answer! i have 1 more question, Can I apply wsd scheduler in This repo's finetuning code?
How did you proceed with DPO learning? Using CPMTrainer, or HF DPOTrainer? Does CPM Trainer support DPO finetuning?