Closed Qubitium closed 1 year ago
Here is the full stack for input string J-10
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [1,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
An error occurred, model: en->de, translating: ['J-10']
Stacktrace:
Traceback (most recent call last):
File "/raid0/translate/app.py", line 202, in trans
translated.extend(translator.translate(sents))
File "/raid0/translate/translator.py", line 60, in translate
return self.translator.translate(input_text)
File "/raid0/translate/model.py", line 114, in translate
return self._translate(input_text)
File "/raid0/translate/model.py", line 96, in _translate
translated = self.model.generate(**tokens, max_new_tokens=50000)
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_utils.py", line 1577, in generate
return self.beam_search(
File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_utils.py", line 2747, in beam_search
outputs = self(
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 1440, in forward
outputs = self.model(
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 1240, in forward
decoder_outputs = self.decoder(
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 1042, in forward
layer_outputs = decoder_layer(
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 424, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/marian/modeling_marian.py", line 195, in forward
query_states = self.q_proj(hidden_states) * self.scaling
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())`
More model crashes Helsinki-NLP/opus-mt-en-ar
, input: ['Freaky Friday']
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1666642975993/work/aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
An error occurred, model: Helsinki-NLP/opus-mt-en->ar, translating: ['Freaky Friday']
Stacktrace:
Traceback (most recent call last):
File "/raid0/translate/app.py", line 202, in trans
translated.extend(translator.translate(sents))
File "/raid0/translate/translator.py", line 95, in translate
t1_translated = self.t1.translate(input_text)
File "/raid0/translate/translator.py", line 60, in translate
return self.translator.translate(input_text)
File "/raid0/translate/model.py", line 117, in translate
return self._translate(input_text)
File "/raid0/translate/model.py", line 100, in _translate
translated = self.model.generate(**tokens, max_new_tokens=2048)
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_utils.py", line 1577, in generate
return self.beam_search(
File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_utils.py", line 2766, in beam_search
next_token_scores_processed = logits_processor(input_ids, next_token_scores)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_logits_process.py", line 92, in __call__
scores = processor(input_ids, scores)
File "/root/anaconda3/lib/python3.9/site-packages/transformers/generation_logits_process.py", line 435, in __call__
dynamic_banned_tokens = self._calc_banned_bad_words_ids(input_ids.tolist())
RuntimeError: CUDA error: device-side assert triggered
Closing issue. Using the transformer pipeline("translate".....Opus) avoids all the crashes.
Try any of the of the two and translation on web UI will return "J..........." or "J-10............" after 16 seconds but in fact, it caused a server crash.
https://huggingface.co/Helsinki-NLP/opus-mt-en-de?text=J-10 https://huggingface.co/Helsinki-NLP/opus-mt-en-de?text=J
Env: Conda Pytorch 1.13, Cuda 11.7, transformers on GPU
The crash is also happening on CPU only device.
On opus-mt-es-fr model we saw another GPU crash with very same stack trace. The UI link (CPU) shows failed translation with gibberish output at end. On GPU it should stacktrace like previous.
https://huggingface.co/Helsinki-NLP/opus-mt-es-fr
- ¿Porqué crees que renté una habitación con una poza privada… Akane? – Le mordió el lóbulo. – Ella se encogió de hombros. – Para quitarte todas esas dudas de la cabeza.