Vance0124 / Token-level-Direct-Preference-Optimization

Reference implementation for Token-level Direct Preference Optimization(TDPO)
Apache License 2.0
110 stars 12 forks source link