bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
745 stars 193 forks source link

APPS dataset prompting seems wrong #92

Closed hongcheki closed 1 year ago

hongcheki commented 1 year ago

In the original APPS paper and their original code, Standard Input format is used when fn_name is not given.

But, in here, Standard is used when fn_name is given.

loubnabnl commented 1 year ago

Thanks for the catch! It seems they were also swapped in the fine-tuning code which is why we didn't observe performance discrepancy This PR fixes it