chujiezheng / LLM-MCQ-Bias

Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"
https://arxiv.org/abs/2309.03882
35 stars 0 forks source link

Length normalization of the NLL #3

Closed frankdarkluo closed 7 months ago

frankdarkluo commented 7 months ago

I recently run the code on 'noid' in ARC dataset and I can achieve very similar results (using a different GPU), however, when I look into the code, I found there is no length normalization as mentioned in the paper, did I miss anything?

https://github.com/chujiezheng/LLM-MCQ-Bias/blob/main/code/eval_clm_utils.py#L249--L254

chujiezheng commented 7 months ago

As the batch size is 1, the loss calculated by the built-in transformers implementation has been averaged over all target tokens.