evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023
https://evalplus.github.io
Apache License 2.0
1.13k stars 102 forks source link

🤗 [REQUEST] - IBM Granite Code Models #209

Open ethanc8 opened 3 months ago

ethanc8 commented 3 months ago

Model introduction

The models are code generation models created by IBM. They include both base models and instruct models. They are new models that have been trained from scratch, except that the 34B version is based on the 20B version.

Model URL

https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330

Additional information (Optional)

No response

Decontamination

They have not specified any decontamination information. Information about the training data can be found at https://arxiv.org/pdf/2405.04324#page=4.

Author

No

Data

No

Security

Integrity

ethanc8 commented 3 months ago

Here are the reported MBPP and MBPP+ scores: image

They have not reported HumanEval+ scores, but they have reported HumanEvalSynthesize and MultiPL-E scores at https://arxiv.org/pdf/2405.04324#page=10.

ethanc8 commented 3 months ago

They did not specify which version of MBPP+ they're using here, so they might be using v0.1.0.

Rubiel1 commented 1 month ago

Hello, I am also interested on the evaluation of the Granite models.