Helsinki-NLP / OPUS-MT-train

Training open neural machine translation models
MIT License
323 stars 40 forks source link

Reproduced crash on Opus-mt-en-de model using string "J" and "J-10" #86

Closed Qubitium closed 1 year ago

Qubitium commented 1 year ago

Try any of the of the two and translation on web UI will return "J..........." or "J-10............" after 16 seconds but in fact, it caused a server crash.

https://huggingface.co/Helsinki-NLP/opus-mt-en-de?text=J-10 https://huggingface.co/Helsinki-NLP/opus-mt-en-de?text=J

Env: Conda Pytorch 1.13, transformers on GPU

The crash is also happening non CPU device.

An error occurred, model: en->de, translating: ['J']
Stacktrace:
Traceback (most recent call last):
  File "/raid0/translate/app.py", line 202, in trans
    translated.extend(translator.translate(sents))
  File "/raid0/translate/translator.py", line 60, in translate
    return self.translator.translate(input_text)
  File "/raid0/translate/model.py", line 114, in translate
    return self._translate(input_text)
  File "/raid0/translate/model.py", line 96, in _translate
    translated = self.model.generate(**tokens, max_new_tokens=50000)
  File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_utils.py", line 1577, in generate
    return self.beam_search(
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_utils.py", line 2747, in beam_search
    outputs = self(
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 1440, in forward
    outputs = self.model(
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 1240, in forward
    decoder_outputs = self.decoder(
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 1042, in forward
    layer_outputs = decoder_layer(
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 424, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 195, in forward
    query_states = self.q_proj(hidden_states) * self.scaling
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())`
Qubitium commented 1 year ago

On opus-mt-es-fr model we saw another GPU crash with very same stack trace. The UI link (CPU) shows failed translation with gibberish output at end. On GPU it should stacktrace like previous.

https://huggingface.co/Helsinki-NLP/opus-mt-es-fr

- ¿Porqué crees que renté una habitación con una poza privada… Akane? – Le mordió el lóbulo. – Ella se encogió de hombros. – Para quitarte todas esas dudas de la cabeza.