Multi-head attention module throwing error: "RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
2020-09-09 20:42:57 | WARNING | fairseq.data.data_utils | 62 samples have invalid sizes and will be skipped, max_positions=(1024, 1024), first few sample ids=[107311, 69863, 57884, 54207, 83884, 71724, 46, 47217, 64581, 106659]
Traceback (most recent call last):
File "/opt/conda/bin/fairseq-generate", line 8, in <module>
sys.exit(cli_main())
File "/opt/conda/lib/python3.7/site-packages/fairseq_cli/generate.py", line 274, in cli_main
main(args)
File "/opt/conda/lib/python3.7/site-packages/fairseq_cli/generate.py", line 38, in main
return _main(args, sys.stdout)
File "/opt/conda/lib/python3.7/site-packages/fairseq_cli/generate.py", line 150, in _main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/opt/conda/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 361, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 159, in generate
return self._generate(sample, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 198, in _generate
encoder_outs = self.model.forward_encoder(net_input)
File "/opt/conda/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 697, in forward_encoder
for model in self.models
File "/opt/conda/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 697, in <listcomp>
for model in self.models
File "/opt/conda/lib/python3.7/site-packages/fairseq/models/fairseq_encoder.py", line 53, in forward_torchscript
return self.forward_non_torchscript(net_input)
File "/opt/conda/lib/python3.7/site-packages/fairseq/models/fairseq_encoder.py", line 62, in forward_non_torchscript
return self.forward(**encoder_input)
File "/opt/conda/lib/python3.7/site-packages/fairseq/models/transformer.py", line 411, in forward
x = layer(x, encoder_padding_mask)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/modules/transformer_layer.py", line 122, in forward
attn_mask=attn_mask,
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/modules/multihead_attention.py", line 342, in forward
attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Reproduced on TPU...
2020-09-09 20:38:46 | INFO | fairseq_cli.generate | loading model(s) from redacted/path/checkpoints/checkpoint_best.pt
2020-09-09 20:38:52 | WARNING | fairseq.data.data_utils | 62 samples have invalid sizes and will be skipped, max_positions=(1024, 1024), first few sample ids=[107311, 69863, 57884, 54207, 83884, 71724, 46, 47217, 64581, 106659]
Traceback (most recent call last):
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq_cli/generate.py", line 278, in <module>
cli_main()
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq_cli/generate.py", line 274, in cli_main
main(args)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq_cli/generate.py", line 38, in main
return _main(args, sys.stdout)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq_cli/generate.py", line 150, in _main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/tasks/fairseq_task.py", line 361, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/sequence_generator.py", line 159, in generate
return self._generate(sample, **kwargs)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/sequence_generator.py", line 198, in _generate
encoder_outs = self.model.forward_encoder(net_input)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/sequence_generator.py", line 697, in forward_encoder
for model in self.models
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/sequence_generator.py", line 697, in <listcomp>
for model in self.models
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/models/fairseq_encoder.py", line 53, in forward_torchscript
return self.forward_non_torchscript(net_input)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/models/fairseq_encoder.py", line 62, in forward_non_torchscript
return self.forward(**encoder_input)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/models/transformer.py", line 411, in forward
x = layer(x, encoder_padding_mask)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/modules/transformer_layer.py", line 122, in forward
attn_mask=attn_mask,
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/fairseq/modules/multihead_attention.py", line 342, in forward
attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
NOTE: the TPU vs GPU machine is probably a red herring and the real difference is different fairseq or pytorch versions.
Code sample
See fairseq CLI commands above.
Expected behavior
No error, outputs fairseq-generate results.
Environment
Environment for running fairseq-train:
fairseq Version (e.g., 1.0 or master): c1e734b2dd7024044c8dee551620146e4f872ad4 and latest master
PyTorch Version (e.g., 1.0): 1.6.0
OS (e.g., Linux): Debian GNU/Linux 9 \n \l
How you installed fairseq (pip, source): pip
Build command you used (if compiling from source): NA
Python version: Python 3.6.10 :: Anaconda custom (64-bit) (because I'm using the default env for TPU training - conda activate torch-xla-1.6)
CUDA/cuDNN version: NA
GPU models and configuration: TPU v3.8 running TPU software v2.3
Any other relevant information: GCP image debian-9-torch-xla-v20200828
Environment for running fairseq-generate:
fairseq Version (e.g., 1.0 or master): c1e734b2dd7024044c8dee551620146e4f872ad4 and latest master
PyTorch Version (e.g., 1.0) - 1.4.0 (also tried 1.5.1 and 1.6.0)
OS (e.g., Linux): Debian GNU/Linux 9 \n \l
How you installed fairseq: pip
Build command you used (if compiling from source): NA
Python version: Python 3.7.8
CUDA/cuDNN version: 10.1
GPU models and configuration: 1 x NVIDIA Tesla V100
Any other relevant information: GCP image c2-deeplearning-pytorch-1-4-cu101-v20200804-debian-9
and the issue was resolved. This is a potential solution but I understand that reshape() is more expensive than view() so I'll leave it to the experts to decide.
🐛 Bug
Multi-head attention module throwing error: "RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
To Reproduce
Train a model using a TPU machine
Generate results (reproduced on both TPU, GPU, and CPU)
See error
On GPU...
Reproduced on TPU...
NOTE: the TPU vs GPU machine is probably a red herring and the real difference is different fairseq or pytorch versions.
Code sample
See fairseq CLI commands above.
Expected behavior
No error, outputs fairseq-generate results.
Environment
Environment for running fairseq-train:
pip
, source): pipdebian-9-torch-xla-v20200828
Environment for running fairseq-generate:
c2-deeplearning-pytorch-1-4-cu101-v20200804-debian-9
Additional context
The error is pretty clear so I replaced line 342 of the Multihead Attention module with
and the issue was resolved. This is a potential solution but I understand that reshape() is more expensive than view() so I'll leave it to the experts to decide.