Open loubnabnl opened 1 year ago
This PR adds Odex benchmark to the evaluation harness, however there is a discrepancy in pass@1 for codegen-2B-mono (37% vs 41%) between the two implementations as explained in this comment
This PR adds Odex benchmark to the evaluation harness, however there is a discrepancy in pass@1 for codegen-2B-mono (37% vs 41%) between the two implementations as explained in this comment