EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.61k stars 1.75k forks source link

understanding context length behaviors #1642

Open simran-arora opened 6 months ago

simran-arora commented 6 months ago

Hi, I have a quick question. What is the behavior of the harness if the input examples exceed the model's sequence length in long-document tasks (and how do few-shot examples influence this)? Thank you!

haileyschoelkopf commented 6 months ago

Hi!

The current (intended) behavior is to simply left-truncate so things won't exceed (model max length).

(PS. this is all described for HFLM, but the other local model impls are meant to match HF as much as possible in behaviors like this.)

There aren't currently any optimizations we're doing w.r.t intelligently truncating beyond these--because LMs' tokenizers are (not currently) exposed elsewhere to the tasks or construction of string inputs, it's a pain to figure out what would be truncated beforehand and e.g. only provide the max number of shots that fit while maintaining the prefixed task description and fewshot format. Would like to change this in future though.

Would like to improve this behavior in future, or at minimum make it clear via logging when requests are being truncated!

Hope this helps!