bigcode-project / bigcodebench

BigCodeBench: Benchmarking Code Generation Towards AGI
https://bigcode-bench.github.io/
Apache License 2.0
193 stars 22 forks source link

🤗 [REQUEST] - Gemma 2 #18

Closed ethanc8 closed 2 months ago

ethanc8 commented 3 months ago

Model introduction

This is Google's newest open model family, providing LLMs distilled from an undisclosed larger model. Compared to other models that are not marketed for code completion on DevQualityEval, the 27B model has very good performance, even surpassing all Gemini-1.5 models. The 27B-it model is also preferred over Llama-3-70B-Instruct by Chatbot Arena users.

Model URL

https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315

Additional instructions (Optional)

Many older versions of llama.cpp, transformers, and other libraries are known to have bugs when running inference with the Gemma 2 models, especially with the 27B models. I don't know whether these issues have been resolved yet.

Author

No

Security

Integrity

terryyz commented 3 months ago

Gemma-2-9b-it has already been on the leaderboard. However, the 27b model has issues related to soft capping, which significantly degrades generation quality. Although vLLM has released v0.5.1 to address this issue, I still couldn't achieve optimal performance with the 27b model, though it is slightly better than with vLLM v0.5.0.

Currently, I am using the NVIDIA API for generation, which is the same as DevQualityEval. Based on my observations on BigCodeBench, its performance is comparable to Gemini-1.5-Flash-API-0514 and Llama-3-70B-Instruct, but it doesn't surpass Gemini-1.5-Pro-API-0514.

The leaderboard will be updated soon, with other models included.

ethanc8 commented 3 months ago

OK, thanks!

terryyz commented 3 months ago

FYI, I'm trying the vLLM-27b again on a new env. Hope the quality will be similar to the NVIDIA API.

terryyz commented 2 months ago

Results w/ vLLM will be updated shortly.