An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
1.71k
stars
160
forks
source link
使用Deepseek-lite训练DPO,显示expected mat1 and mat2 to have the same type, but got: float != c10: : BFLoat16 #306
Open
victorShawFan opened 1 month ago