Is llama2-7B-chat weaker thank llama2-7B?

abacaj / code-eval

Run evaluation on LLMs using human-eval benchmark

MIT License

379 stars 36 forks source link

Open sunyuhan19981208 opened 12 months ago

sunyuhan19981208 commented 12 months ago

I got only 9.7% for llama2-7B-chat on human-eval using your script

{'pass@1': 0.0975609756097561}

abacaj commented 12 months ago

Hi, I think you will have to make sure the prompt template is correct