bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
782 stars 208 forks source link

Reproduction of HELM results #110

Closed tangzhy closed 1 year ago

tangzhy commented 1 year ago

How do you evaluate bigcode models on HELM benchmark?

Do you directly using their crfm-helm tools?

If so, can you release your bash commands for the community to rigorously reproduce your results?

loubnabnl commented 1 year ago

Sorry we don't have the commands, members from the HELM team did the evaluation, but I believe they used the default as for other code models on reasoning tasks. Maybe you can open the issue on their repo.