GanjinZero / RRHF

[NIPS2023] RRHF & Wombat
780 stars 49 forks source link

Wombat与RRHF #40

Open Guochry opened 11 months ago

Guochry commented 11 months ago

想请问一下,有比较不同reward来源的实验吗?即比较“开源reward model”和“ChatGPT”分别作为reward score来源的效果孰优孰劣吗?

GanjinZero commented 11 months ago

没有做过开源rm的;我们做的时候没有什么好的开源rm

Guochry commented 11 months ago

请问Dahoas/gptj-rm-static不是吗)

GanjinZero commented 11 months ago

只能在hh上用,而且准确性不高

Guochry commented 11 months ago

好吧,谢谢您啦