APPS dataset prompting seems wrong

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

745 stars 193 forks source link

Closed hongcheki closed 1 year ago

hongcheki commented 1 year ago

In the original APPS paper and their original code, Standard Input format is used when fn_name is not given.

But, in here, Standard is used when fn_name is given.

loubnabnl commented 1 year ago

Thanks for the catch! It seems they were also swapped in the fine-tuning code which is why we didn't observe performance discrepancy This PR fixes it