Fix off by 1 error on masked tokens for RM training

EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

https://www.eleuther.ai/

Apache License 2.0

6.96k stars 1.02k forks source link

Fix off by 1 error on masked tokens for RM training #1285

Closed dmahan93 closed 2 months ago

dmahan93 commented 2 months ago

since the mask is assumed to be causal, previously we were doing the reward value on the last token in the sequence, not the EOS token. Did not affect llama-3 models much, since the EOS token is double applied in this instance (the chat has an EOS token already), but would be potentially problematic for other models.