New Gemma-7b DPO Model - Githubissues

ajseo95 commented 2 weeks ago

Hi, thanks for your impactful work. :)

Recently, my coauthors and I submitted a paper, and we found that our model, Gemma-MMPO, shows state-of-the-art results among 7B DPO models (first place when evaluated across all subsets including prior sets, and it ranked second place when excluding prior). We are considering uploading our paper to arXiv soon. :)

Can you please include our latest Gemma-MMPO into the reward bench? Our model is uploaded at Ahjeong/MMPO_Gemma_7b.

Here are some important details for evaluating our model:

For performance checking, our model shows 75.6% accuracy, including prior sets.
For code customization, we used our modified tokenizer template, which is similar to Gemma-it. Our template is saved in our tokenizer, but it seems that the reward-bench code utilizes FastChat template. Do we need to share the FastChat customization code? Sorry if this bothers you, but thank you for your help in advance. T.T

Thank you so much! :)

natolambert commented 1 week ago

Hey @ajseo95, I have an initial implementation of this coming, but can you share what the reference model is? I'm going to use Gemma 7b to start but that's a guess.

If the chat template is in the tokenizer we default to that.

natolambert commented 1 week ago

Looks like the reference model I used wasn't right, score was much lower than you reported.

ajseo95 commented 1 week ago

First of all, thank you for your nice comment. I'm sorry that I forgot to share reference model. This is our reference model: kykim0/gemma-7b-ultrachat-sft Thank you so much for your help! :)

ajseo95 commented 1 week ago

And as you said, our template is saved in tokenizer.chat_template. Thank you so much for your help🙂

ajseo95 commented 1 week ago

I'm so sorry, but after checking our model's performance on the leaderboard, I realized that I shared the wrong checkpoint. Could you please update it with this model instead? Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3 ref_model and tokenizer.chat_template is same with what I shared before. I'm so sorry for the inconvenience and thank you so much in advance!

ajseo95 commented 3 days ago

I apologize, but I realized that I set the model checkpoint access to private. I changed it to public now and this is our final checkpoint: Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3. Thank you in advance! :)

allenai / reward-bench

New Gemma-7b DPO Model #143