allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.37k stars 431 forks source link

Fix off-by-one error in eval #643

Closed davidbrandfonbrener closed 2 months ago

davidbrandfonbrener commented 2 months ago

This fixes an error in eval. It is a subtle bug, but really effects the results for tasks with short continuations (and generally makes all results slightly incorrect).

For example, before this fix, if we use boolq as an example. If I use the tokenizer to decode the query I get:

Phantom pain sensations are described as perceptions that an individual experiences relating to a limb or an organ that is not physically part of the body. Limb loss is a result of either removal by amputation or congenital limb deficiency. However, phantom limb sensations can also occur following nerve avulsion or spinal cord injury.\nQuestion: is pain experienced in a missing body part or paralyzed area?\nAnswer:

While decoding the query from the fixed code properly gives:

Phantom pain sensations are described as perceptions that an individual experiences relating to a limb or an organ that is not physically part of the body. Limb loss is a result of either removal by amputation or congenital limb deficiency. However, phantom limb sensations can also occur following nerve avulsion or spinal cord injury.\nQuestion: is pain experienced in a missing body part or paralyzed area?\nAnswer: no

This is not just an issue with boolq, but effects all tasks. Dropping the last token is clearly wrong.

I also fixed the indexing in a corresponding way within the ICLMetric.