issues
search
Vance0124
/
Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
Apache License 2.0
110
stars
12
forks
source link
Can you train DPO directly? Using open-source base models.
#6
Open
tcxia
opened
2 months ago