测试报错 - Githubissues

leizhu1989 commented 3 months ago

os:ubuntu20.04 英伟达显卡驱动：470.199 显卡：T4 pytorch:2.1.0 transformer:4.40

conda环境运行

加了使用gpu代码： device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

修改：model = AutoModel.from_pretrained( MODEL_PATH, trust_remote_code=True, device_map="auto" ).eval().to(device) # 将模型移动到GPU

cpu下可以推理

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 12.24it/s] Traceback (most recent call last): File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in device_map="auto").eval().to(device) File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to return super().to(*args, **kwargs) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

zRzRzRzRzRzRzR commented 3 months ago

这个是显卡的问题吧，我看到这个驱动，环境没有配对估计，而且这个卡似乎也带不动这个模型

leizhu1989 commented 3 months ago

@zRzRzRzRzRzRzR 搞错了，是A10显卡，不知道是不是因为显卡驱动版本太低导致的

zRzRzRzRzRzRzR commented 3 months ago

大概率是的，更新驱动到535以上吧， cuda也建议装11.8或者12以上

THUDM / GLM-4

测试报错 #240