Open eldarkurtic opened 2 weeks ago
Hi! This is because we default to target_delimiter=" "
. That was a natural choice for base models, but we should think about the best way to handle this when the chat template takes care of the formatting.
cc: @NathanHB @clefourrier @haileyschoelkopf
Since at the moment I am mostly running leaderboard tasks, I have measured what the impact is from this subtle change with " " in front of the target answer. Here are results:
Without the space, the scores now perfectly match with HF Leaderboard scores (https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Notice that with space, 70B model is almost as bad as the 8B one, which definitely seems unexpected.
The only change I made was to add target_delimiter: ""
into yaml configs for leaderboard tasks. In case this is an acceptable fix, let me know and I can open a PR with changes.
I can confirm that we've added in our fork (that we're using to run evals of the leaderboard, you can find the command in our doc) the fact that delimiter is always None for chat template tasks cc @NathanHB who added it - It's been a while back so unsure where though
Hi, While running
leaderboard_mmlu_pro
evals I've noticed an unexpected space character. Here is an example request:This is a 5-shot example, so looking at the first shot in
arguments
, the correct answer is formatted as:More specifically, notice that the correct answer is presented as:
<|end_header_id|>\n\nA<|eot_id|>
(no space beforeA
).Unfortunately, contrary to few-shot examples, the answer of the actual question has a space character:
...<|end_header_id|>\n\n", ' I')
.Before trying to do down the rabbit hole to find where this diff is coming from, I wanted to reach out here in case you are already familiar with this? My guess is that this is probably coming from the infamous
add_prefix_space
"feature" of HF-tokenizers and the fact that answers from few-shot samples are tokenized as part of a larger sequence, whereas the answer of the actual question is tokenized on its own as a single character.