update humaneval postprocessing

As highlighted in issue https://github.com/bigcode-project/bigcode-evaluation-harness/issues/46 it's possible to have more than one stop word in the code completion and keeping only last black might leave intermediate functions for example. Normally we have the stopping criteria that stops generation once there's a stop word, but when batch_size>1 it's done on all batch and generation stops only when all of them have seen a stop word.

This doesn't seems to have a noticeable impact on performance though especially pass@1 and pass@10.

EDIT: another fix, following this discussion with @Muennighoff he noticed that the eos_token is being skipped during the decoding and this can lead to generations with unwanted extra text when batch_size > 1. Thanks for the fix!

bigcode-project / bigcode-evaluation-harness

update humaneval postprocessing #63