Closed frankdarkluo closed 7 months ago
I recently run the code on 'noid' in ARC dataset and I can achieve very similar results (using a different GPU), however, when I look into the code, I found there is no length normalization as mentioned in the paper, did I miss anything?
https://github.com/chujiezheng/LLM-MCQ-Bias/blob/main/code/eval_clm_utils.py#L249--L254
As the batch size is 1, the loss calculated by the built-in transformers implementation has been averaged over all target tokens.
transformers
I recently run the code on 'noid' in ARC dataset and I can achieve very similar results (using a different GPU), however, when I look into the code, I found there is no length normalization as mentioned in the paper, did I miss anything?
https://github.com/chujiezheng/LLM-MCQ-Bias/blob/main/code/eval_clm_utils.py#L249--L254