Open monilouise opened 1 month ago
Hi Thanks for reaching out This error usually means the padding has some issue. It’s probably in the data collator part. If you can please do some quick research on it for the current transformers version on what changed that could have caused this error. It’s probably unrelated to the EOS token used in file separation. I will also have a look in my free time.
Best, Yaser
From: Monique Monteiro @.> Sent: Friday, October 18, 2024 4:51:32 AM To: YaserAlOsh/JIT-SDP-CodePTMs @.> Cc: Subscribed @.***> Subject: [YaserAlOsh/JIT-SDP-CodePTMs] Issue in reproducing CodeReviewer results (Issue #1)
Hi,
I've found your paper "Parameter Efficient Fine-Tuning of Pre-trained Code Models for Just-in-Time Defect Prediction." I'm trying to reproduce your results with CodeReviewer. Still, I came up with the following issue: it seems you use (default EOS token in T5 according to some documentation) as the "file separator" in the change information. However, the code is throwing the following error with the current transformer version:
ValueError: All examples must have the same number of tokens.
Would you happen to have any workaround?
Thanks.
— Reply to this email directly, view it on GitHubhttps://github.com/YaserAlOsh/JIT-SDP-CodePTMs/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGCBNA5UWXGIG2H5FHUYCZLZ4BLRJAVCNFSM6AAAAABQE32YEWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TMMJQGI3TQNQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @YaserAlOsh ,
Adding more detail to the issue, the error says:
/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
2071
2072 if len(torch.unique_consecutive(eos_mask.sum(1))) > 1:
-> 2073 raise ValueError("All examples must have the same number of
The Transformers version is 4.45.2.
I manually inspected some lists of input IDs, and in fact, the EOS token appears more than once, perhaps due to file separation. Strangely, even downgrading the libraries below, the same error occurs.
transformers=4.35.2 tokenizers=0.15.0 datasets=2.14.5
/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
2077
2078 if len(torch.unique_consecutive(eos_mask.sum(1))) > 1:
-> 2079 raise ValueError("All examples must have the same number of
ValueError: All examples must have the same number of
Finally, I'm running these tests on Google Colab (A100).
Hi,
I've found your paper "Parameter Efficient Fine-Tuning of Pre-trained Code Models for Just-in-Time Defect Prediction." I'm trying to reproduce your results with CodeReviewer. Still, I came up with the following issue: it seems you use (default EOS token in T5 according to some documentation) as the "file separator" in the change information. However, the code is throwing the following error with the current transformer version:
ValueError: All examples must have the same number of tokens.
Would you happen to have any workaround?
Thanks.