Open SouthWindShiB opened 2 years ago
The structure of Longformer Attention Windows makes your input sequence length must be the multiple of windows length. To use it, you can pad your input sequence to 512 or 1024 and give the model correct input attention mask.
File "D:\Anaconda\envs\torch_1.7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "D:\Anaconda\envs\torch_1.7\lib\site-packages\transformers\models\bert\modeling_bert.py", line 1068, in forward return_dict=return_dict, File "D:\Anaconda\envs\torch_1.7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "D:\Anaconda\envs\torch_1.7\lib\site-packages\transformers\models\bert\modeling_bert.py", line 591, in forward output_attentions, File "D:\Anaconda\envs\torch_1.7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "D:\Anaconda\envs\torch_1.7\lib\site-packages\transformers\models\bert\modeling_bert.py", line 476, in forward past_key_value=self_attn_past_key_value, File "D:\Anaconda\envs\torch_1.7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "D:\Anaconda\envs\torch_1.7\lib\site-packages\transformers\models\bert\modeling_bert.py", line 408, in forward output_attentions, File "D:\Anaconda\envs\torch_1.7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(input, *kwargs) File "I:\PycharmProject\zh_efficient-autogressive-EL\model\Longformer_zh.py", line 21, in forward output_attentions=output_attentions) File "D:\Anaconda\envs\torch_1.7\lib\site-packages\transformers\models\longformer\modeling_longformer.py", line 591, in forward query_vectors, key_vectors, self.one_sided_attn_window_size File "D:\Anaconda\envs\torch_1.7\lib\site-packages\transformers\models\longformer\modeling_longformer.py", line 803, in _sliding_chunks_query_key_matmul ), f"Sequence length should be multiple of {window_overlap 2}. Given {seq_len}" AssertionError: Sequence length should be multiple of 512. Given 158
did you miss something that pad the sequence to suitbal length?