bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
744 stars 193 forks source link

investigate discrepancy in odex implementation #68

Open loubnabnl opened 1 year ago

loubnabnl commented 1 year ago

This PR adds Odex benchmark to the evaluation harness, however there is a discrepancy in pass@1 for codegen-2B-mono (37% vs 41%) between the two implementations as explained in this comment