TUDB-Labs / mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters
Apache License 2.0
277 stars 53 forks source link

[feature] support dpo trainer #211

Closed yezhengmao1 closed 5 months ago

yezhengmao1 commented 5 months ago

@waitfor-night can you check this code about the dpo algorithm?

yezhengmao1 commented 5 months ago

@waitfor-night I added the feature to load the trained LoRA adapter and freeze it when trained by DPO can you check and test it?

merlintang commented 5 months ago

leave comments