I am running into an issue where the exact same code if I use t5-small, but if I switch to Helsinki-NLP/opus-mt-zh-en it does not work anymore. The error is:
Traceback (most recent call last):
File "translation/run_translation.py", line 686, in <module>
main()
File "translation/run_translation.py", line 603, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 1504, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 1742, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 2486, in training_step
loss = self.compute_loss(model, inputs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 2518, in compute_loss
outputs = model(**inputs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/models/marian/modeling_marian.py", line 1455, in forward
return_dict=return_dict,
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/models/marian/modeling_marian.py", line 1229, in forward
return_dict=return_dict,
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/models/marian/modeling_marian.py", line 751, in forward
embed_pos = self.embed_positions(input_shape)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/deepspeed/compression/basic_layer.py", line 130, in forward
self.sparse)
File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not torch.Size
Hello,
I am trying to use OPUS-MT together with DeepSpeed compression (examples can be found at this link https://github.com/microsoft/DeepSpeedExamples under
model_compression
).I am running into an issue where the exact same code if I use
t5-small
, but if I switch toHelsinki-NLP/opus-mt-zh-en
it does not work anymore. The error is:Has anyone ever encountered this issue?