bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
745 stars 193 forks source link

update humaneval postprocessing #63

Closed loubnabnl closed 1 year ago

loubnabnl commented 1 year ago

As highlighted in issue https://github.com/bigcode-project/bigcode-evaluation-harness/issues/46 it's possible to have more than one stop word in the code completion and keeping only last black might leave intermediate functions for example. Normally we have the stopping criteria that stops generation once there's a stop word, but when batch_size>1 it's done on all batch and generation stops only when all of them have seen a stop word.

This doesn't seems to have a noticeable impact on performance though especially pass@1 and pass@10.

EDIT: another fix, following this discussion with @Muennighoff he noticed that the eos_token is being skipped during the decoding and this can lead to generations with unwanted extra text when batch_size > 1. Thanks for the fix!

loubnabnl commented 1 year ago

Thanks Niklas! I think for now this will be just the HumanEval python openai version but I'll keep that in mind.