EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

https://www.eleuther.ai

MIT License

6.86k stars 1.83k forks source link

Open noowad93 opened 2 months ago

noowad93 commented 2 months ago

the updated IFEval dataset (https://www.oxen.ai/wis-k/instruction-following-eval/file/main/instruction-following-eval_train.parquet) now includes letters like "!", "#", which are considered true under the conditions mentioned. As a result, the letters may be changed to random characters.

haileyschoelkopf commented 2 months ago

ccing @NathanHB @clefourrier for discretion over changing the official leaderboard IFEval task definition!

2218 from @lewtun updated the dataset to `google/ifeval` for our `ifeval` non-leaderboard task--perhaps that should be carried over to the leaderboard variant regardless of whether that fixes this issue as well?