bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

Endpoints Integration to evaluate closed source Models. #179

Closed Anindyadeep closed 1 month ago

Anindyadeep commented 6 months ago

This PR integrates closed-source models interfaced through API endpoints. Solves issue #161

Anindyadeep commented 6 months ago

This is the result in instruct-humaneval


{
  "instruct-humaneval": {
    "pass@1": 0.651219512195122,
    "pass@10": 0.7535606619462241,
}
Anindyadeep commented 5 months ago

Hey @loubnabnl, is it possible to review this PR?

Thanks

Anindyadeep commented 1 month ago

Hi @loubnabnl, apologies for not staying updated with the PR, for some bandwidth issue, I might not be be further contribute to this PR and since right now lot of things (on the basis of metrics and models has changed), so I am closing this PR.

If I open a new one in the future, I would definitely follow the comments you mentioned. Thanks