MLGroupJLU / LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
https://arxiv.org/abs/2307.03109
1.38k stars 86 forks source link

Add Llama 2 as model evaluated? #15

Open tiansiyuan opened 1 year ago

jindongwang commented 1 year ago

Could you please be more specific? Where should we add this model?

tiansiyuan commented 1 year ago

In the paper, Llama is mentioned twice, both on page 6.

The first one is from a paper (Saparov et al., 2023), so just keep it.

The second one,

"Moreover, LLaMA-65B is the most robust open-source LLMs to date, which per- forms closely to code-davinci-002."

could be replaced by

"Moreover, LLAMA 2 70B is the most robust open-source LLMs to date, which performs very closely to GPT-3.5 and PaLM. But there is still a large gap in performance between LLAMA 2 70B and GPT-4 and PaLM-2-L.(Touvron et al., 2023)"

As code-davinci-002 is a code generation model derived from GPT-3, I think it is not appropriate to compare it with a pretrained model such as LLaMA. Just for your consideration.

Also, I'd suggest to add the following paper as reference.

Llama 2: Open Foundation and Fine-Tuned Chat Models

jindongwang commented 1 year ago

Thanks for the detailed suggestion! We'll update the paper accordingly.