Closed zorazrw closed 8 months ago
The current scores for this implementation for codegen-2b-mono are:
{
"odex-en": {
"pass@1": 0.3679726651480638,
"pass@2": 0.40785644553949135,
"pass@5": 0.44750070323076746,
"pass@10": 0.4676729704105087
},
"config": {
"model": "Salesforce/codegen-2B-mono",
"temperature": 0.2,
"n_samples": 50
}
}
This is higher than number reported in Odex paper due to the original implementation not stripping the prompts (see issue https://github.com/zorazrw/odex/issues/5) However when stripping the prompt in that implementation the pass@1 in ~greedy mode (temp 1e-6) is
Overall Pass@K Scores:
[pass@1] 0.4100 (439)
This implementation gives a pass@1 of
{
"odex-en": {
"pass@1": 0.3712984054669704
},
"config": {
"model": "Salesforce/codegen-2B-mono",
"temperature": 1e-06,
"n_samples": 1
}
}
So there is still a gap to be investigated
Added ODEX and MCoNaLa datasets in tasks. Followed the original code repository of ODEX and MCoNaLa to create the processing and evaluation functions.