关于多卡部署 - Githubissues

Anfeather commented 5 months ago

System Info / 系統信息

2080ti多卡

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

多卡运行GLM时，运行代码generated_text_GLM, history = model_GLM.chat(tokenizer, prompt, history=[])，显示RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle) 我确定cuda版本正确，并且batchsize只有1. 请问有人遇到过类似问题吗

Expected behavior / 期待表现

zRzRzRzRzRzRzR commented 5 months ago

多卡推理的话，你这个是哪个代码，而且我看了你这个错误好像是驱动和cuda级的，不是代码错误，请你发个完整的官方代码运行的位置和完整报错

Anfeather commented 5 months ago

多卡推理的话，你这个是哪个代码，而且我看了你这个错误好像是驱动和cuda级的，不是代码错误，请你发个完整的官方代码运行的位置和完整报错

我找到出bug的原因了，当我在一个py文件中同时导入blip2和glm-6b模型时，就会报错，如果只是导入单一模型则没有问题。相关代码如下： local_path = "./blip2-opt-2.7b" processor = Blip2Processor.from_pretrained(local_path) model_large = Blip2ForConditionalGeneration.from_pretrained( local_path, torch_dtype=torch.float16, device_map="auto" ) model_large.eval()

tokenizer = AutoTokenizer.from_pretrained("./chatglm3-6b", trust_remote_code=True) model_GLM = AutoModel.from_pretrained("./chatglm3-6b", trust_remote_code=True, device_map="auto") model_GLM = model_GLM.eval() generated_text_GLM, history = model_GLM.chat(tokenizer, "你好", history=[])

完整报错如下：

Traceback (most recent call last): File "/home2/an/project/DataShunt+/image_caption/a-PyTorch-Tutorial-to-Image-Captioning-master-2/eval_DS_PT.py", line 73, in generated_text_GLM, history = model_GLM.chat(tokenizer, "你好", history=[]) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1042, in chat outputs = self.generate(inputs, gen_kwargs, eos_token_id=eos_token_id) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate return self.sample( File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/transformers/generation/utils.py", line 2468, in sample outputs = self( File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 941, in forward transformer_outputs = self.transformer( File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 834, in forward hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder( File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 641, in forward layer_ret = layer( File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, *kwargs) File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 544, in forward attention_output, kv_cache = self.self_attention( File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home2/an/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 376, in forward mixed_x_layer = self.query_key_value(hidden_states) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/home2/an/anaconda3/envs/GLM1/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

zRzRzRzRzRzRzR commented 5 months ago

嗯那，正常是单独导入，不然分配可能出现问题

THUDM / ChatGLM3

关于多卡部署 #1199

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现