SkyworkAI / Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
Other
1.21k stars 111 forks source link

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)` #38

Closed QuanhuiGuan closed 9 months ago

QuanhuiGuan commented 9 months ago

我使用公示的代码(预测结果,单卡A100) 尝试跑结果,大佬们。请问一下,可以给我一点点建议吗?

出现了下面的错误信息 .... ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0 return self._call_impl(*args, kwargs) ], thread: [11,0 File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl ,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [19,0 return forward_call(*args, *kwargs) ,0] Assertion srcIndex < srcSelectDimSize File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 726, in forward failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [24,0,0 outputs = self.model( ] Assertion srcIndex < srcSelectDimSize failed. File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [25,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed. return self._call_impl(args, kwargs) File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 641, in forward layer_outputs = decoder_layer( File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 449, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/ethan/.cache/huggingface/modules/transformers_modules/Skywork-13B-base/modeling_skywork.py", line 346, in forward query_states = self.q_proj(hidden_states) File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/mnt/SSD_12TB/ethan/application/anaconda3/envs/sky/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

zhao1iang commented 9 months ago

这个应该是您用base的模型,加载的是chat/math模型的demo?因为chat/math添加了special token,而base模型没有添加special token。

QuanhuiGuan commented 9 months ago

确实是的,谢谢大佬解答