Closed RaymondLi0 closed 1 year ago
Actually, with the way we do the generation, there will always be an eof_string
in the generation before it stops due to this function so remove_last_block
always keeps the solution and removes some excess.
What might happen is having some intermediate print/comment/function left in between but I think that shouldn't impact the evaluation (shouldn't happen either as we stop at the first occurence). But I agree keep_first_block
like we do in MBPP seems cleaner. Feel free to open a PR.
closing the issue as this was fixed in https://github.com/bigcode-project/bigcode-evaluation-harness/pull/63
For the HumanEval task, we remove the last block, based on the stop tokens: https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/lm_eval/tasks/humaneval.py#L70
If no stopword is found in the generation (for example if by chance the generation ends exactly at the function's last return statement, or before), then
remove_last_block
would remove the entire generation and return an empty string.It seems to me that we should rather: remove anything that is after the first block, if there ever is a match with one of the stop tokens
If this issue makes sense, happy to create a PR for that.