The experiment setting for "next word prediction" experiment is a little bit confusing.

alonj / Same-Task-More-Tokens

The code for the paper: "Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models"

Apache License 2.0

49 stars 3 forks source link

Thanks for your question and interest in our work! To clarify, when testing the Next-Word Prediction accuracy, we took samples straight out of our dataset (which you can find in this repository or on Huggingface), and randomly sampled tokens at positions specified on the figure (to roughly match the points we sampled the sample length). We then tested the continuation token predicted by each model and checked if its an exact-match for the correct label token. The samples we used had both relevant information and irrelevant information for the original sample questions, but of course there is no such distinction when predicting the next tokens sampled at the middle of the input without the questions appended.

Let me know if this answers your question, or if I misunderstood it.

alonj / Same-Task-More-Tokens

The experiment setting for "next word prediction" experiment is a little bit confusing. #1