Closed timbmg closed 3 months ago
I'll look further @timbmg. Some specific things, some open questions here.
Stanford Human Preferences (SHP), with a subset created by taking 1 sample per prompt with a score ratio above 1.5 and a total number of Reddit votes above 10.
Hi, thanks for this great work, its really interessting and helpful!
I was a bit surprised by the stanfordnlp/SteamSHP-flan-t5-xl and stanfordnlp/SteamSHP-flan-t5-large performance on the SHP dataset in Table 12, because their self reported accuracy is 0.7278 and 0.7203 respectively. Do you know the reason for this difference?
(AFAIK, their reported average also includes the performance on HH-RLHF helpful-base, but I dont think that should drag the performance down that much?)
Vice versa, the HH-RLHF helpful scores in Table 12 are much lower than the reported ones on huggingface (0.731 vs 0.633 and 0.731 vs 0.629).