Closed chengeharrison closed 9 months ago
And typo in Line 77? Should be tokenizer.padding_side = "right"
instead?
Thank you for providing your feedback. We appreciate your comment and acknowledge the validity of your point. We will address the issue mentioned and rectify the typo in the upcoming upload.
The typo and token number calculation issue in the eval_loss.py script have been addressed.
In Line 58, we calculate the number of tokens using
attention_mask = attention_mask[:, :-1]
andtorch.sum(attention_mask).item()
. But do we need to shift attention mask? Maybetorch.sum(attention_mask).item() - batch_size
(without shifting) is correct?For example if the batch size is 2, the input_ids can be
[[1, 2, 3], [1, 2, pad]]
and the attention mask is[[True, True, True], [True, True, False]]
. Usingattention_mask = attention_mask[:, :-1]
andtorch.sum(attention_mask).item()
will output 4 as the number of tokens. But actually the token number should be 3 because we only calculate logits on[2, 3]
and[2, pad]
(first label is shifted bylabel = label[:, 1: ]
) andpad
isn't counted as a valid token for calculating loss. If we setIGNORE_INDEX
inlabels
according toattention_mask
, we don't need shifted attention mask when calculating loss.A code example:
output is:
But from the
Input labels
output, it is obvious that the token num should be 17 (first sample 338 to 29889 and second 30810 to 30267).