Vance0124 / Token-level-Direct-Preference-Optimization

Reference implementation for Token-level Direct Preference Optimization(TDPO)
Apache License 2.0
110 stars 12 forks source link

Can you train DPO directly? Using open-source base models. #6

Open tcxia opened 2 months ago