eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)
Apache License 2.0
2.18k stars 180 forks source link

How to re-implement the result of IMDB sentiment generation. #54

Open junkangwu opened 1 year ago

junkangwu commented 1 year ago

Hi, nice work. I'm interested in the performance of DPO and try to re-implement it, inparticular with IMDB sentiment genearation. May I ask you about detailed procedure or steps? Thanks a lot ! Best