Open ajseo95 opened 2 weeks ago
Hey @ajseo95, I have an initial implementation of this coming, but can you share what the reference model is? I'm going to use Gemma 7b to start but that's a guess.
If the chat template is in the tokenizer we default to that.
Looks like the reference model I used wasn't right, score was much lower than you reported.
First of all, thank you for your nice comment.
I'm sorry that I forgot to share reference model. This is our reference model: kykim0/gemma-7b-ultrachat-sft
Thank you so much for your help! :)
And as you said, our template is saved in tokenizer.chat_template. Thank you so much for your help🙂
I'm so sorry, but after checking our model's performance on the leaderboard, I realized that I shared the wrong checkpoint.
Could you please update it with this model instead? Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3
ref_model and tokenizer.chat_template is same with what I shared before.
I'm so sorry for the inconvenience and thank you so much in advance!
I apologize, but I realized that I set the model checkpoint access to private. I changed it to public now and this is our final checkpoint: Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3
. Thank you in advance! :)
Hi, thanks for your impactful work. :)
Recently, my coauthors and I submitted a paper, and we found that our model, Gemma-MMPO, shows state-of-the-art results among 7B DPO models (first place when evaluated across all subsets including prior sets, and it ranked second place when excluding prior). We are considering uploading our paper to arXiv soon. :)
Can you please include our latest Gemma-MMPO into the reward bench? Our model is uploaded at
Ahjeong/MMPO_Gemma_7b
.Here are some important details for evaluating our model:
Thank you so much! :)