allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
273 stars 25 forks source link

New Gemma-7b DPO Model #143

Open ajseo95 opened 2 weeks ago

ajseo95 commented 2 weeks ago

Hi, thanks for your impactful work. :)

Recently, my coauthors and I submitted a paper, and we found that our model, Gemma-MMPO, shows state-of-the-art results among 7B DPO models (first place when evaluated across all subsets including prior sets, and it ranked second place when excluding prior). We are considering uploading our paper to arXiv soon. :)

Can you please include our latest Gemma-MMPO into the reward bench? Our model is uploaded at Ahjeong/MMPO_Gemma_7b.

Here are some important details for evaluating our model:

Thank you so much! :)

natolambert commented 1 week ago

Hey @ajseo95, I have an initial implementation of this coming, but can you share what the reference model is? I'm going to use Gemma 7b to start but that's a guess.

If the chat template is in the tokenizer we default to that.

natolambert commented 1 week ago

Looks like the reference model I used wasn't right, score was much lower than you reported.

ajseo95 commented 1 week ago

First of all, thank you for your nice comment. I'm sorry that I forgot to share reference model. This is our reference model: kykim0/gemma-7b-ultrachat-sft Thank you so much for your help! :)

ajseo95 commented 1 week ago

And as you said, our template is saved in tokenizer.chat_template. Thank you so much for your help🙂

ajseo95 commented 1 week ago

I'm so sorry, but after checking our model's performance on the leaderboard, I realized that I shared the wrong checkpoint. Could you please update it with this model instead? Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3 ref_model and tokenizer.chat_template is same with what I shared before. I'm so sorry for the inconvenience and thank you so much in advance!

ajseo95 commented 3 days ago

I apologize, but I realized that I set the model checkpoint access to private. I changed it to public now and this is our final checkpoint: Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3. Thank you in advance! :)