Closed kkawamu1 closed 2 years ago
Hi @kkawamu1 , yes you are correct. @richardbaihe noticed the same thing: https://github.com/bigscience-workshop/t-zero/issues/27 Would you like to open a PR to fix that? i won't have the bandwidth to have a look before next week
👀 @thomasw21
System info:
I used Google Colab free to test the evaluation code.
Reproduction:
Expected behavior:
accuracy scores for the two executions of the evaluation script should be the same. i.e. regardless of the batch size, the script should spit out the same number for the accuracy
However, I get
Result: {'accuracy': 0.39285714285714285}
for the batch size=1, butResult: {'accuracy': 0.4107142857142857}
I suspect this has to do with how the padding is handled in DecoderModel. https://github.com/bigscience-workshop/t-zero/blob/master/t0/model.py#L93
When batch > 1, the shorter texts will be padded to the longest text in the batch. i.e. some elements in batch["input_ids"] will contain pad tokens. Therefore, when concatenating input_ids and labels with
"input_ids": torch.cat([batch["input_ids"], batch["labels"]], dim=-1)
, some of final input to the DecoderModel will look like "T-zero is awesome. Is this true or false? \<pad>\<pad>\<pad>True". I see that this is supposed to be handled by appropriately setting the position_idsposition_ids = torch.cumsum(model_inputs["attention_mask"].to(torch.long), dim=-1) - 1
.What I found was that this pad middle and set position_ids strategy does NOT give the same result as when there is no padding in the middle. Please see: https://colab.research.google.com/drive/1-Bw3-ODDLrEvP75xIzC8wlJmvB7mQqTg?usp=sharing
In a short summary, it looks like the logits for the first token of the label sentence will be different if you have pad tokens in the middle.
Note: When batch==1, there will no padding tokens in batch["input_ids"] since all the sentences in batch["input_ids"] are the same. Therefore, no special handling is done here. So I assume that the value when batch_size=1 gives the correct number.