Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 12.24it/s]
Traceback (most recent call last):
File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in
device_map="auto").eval().to(device)
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to
return super().to(*args, **kwargs)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
os:ubuntu20.04 英伟达显卡驱动:470.199 显卡:T4 pytorch:2.1.0 transformer:4.40
conda环境运行
加了使用gpu代码: device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
修改:model = AutoModel.from_pretrained( MODEL_PATH, trust_remote_code=True, device_map="auto" ).eval().to(device) # 将模型移动到GPU
cpu下可以推理
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 12.24it/s] Traceback (most recent call last): File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in
device_map="auto").eval().to(device)
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to
return super().to(*args, **kwargs)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.