Closed stiyet closed 1 month ago
CUDA_LAUNCH_BLOCKING=1 python
启动python的时候加上这个环境变量试试呢?
CUDA_LAUNCH_BLOCKING=1 python
启动python的时候加上这个环境变量试试呢?
可以了,感谢大佬!
去掉就报错么?
CUDA_LAUNCH_BLOCKING=1 是为了定位 pytorch 的问题,正常是不应该加的。
去掉就报错么?
CUDA_LAUNCH_BLOCKING=1 是为了定位 pytorch 的问题,正常是不应该加的。
对,去掉就有问题,我用两卡A100也是存在问题
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
2024-07-26 14:27:32,424 - asyncio - ERROR - Exception in callback _raise_exception_on_finish(<Future finis...sertions.\n')>) at /opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py:19
handle: <Handle _raise_exception_on_finish(<Future finis...sertions.\n')>) at /opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py:19>
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, self._args)
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 26, in _raise_exception_on_finish
raise e
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 22, in _raise_exception_on_finish
task.result()
File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(self.args, **self.kwargs)
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in forward
outputs = [x.cpu() for x in outputs]
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in TORCH_USE_CUDA_DSA
to enable device-side assertions.
![Uploading 20240726143227.jpg…]()
有两个地方可以再帮忙尝试一下么?
一个是,A100上面,如果tp=1的话还报错么?
另一个是 (with tp > 1),
from lmdeploy import pipeline, VisionConfig
pipe = pipeline(..., vision_config=VisionConfig(thread_safe=True))
有两个地方可以再帮忙尝试一下么?
一个是,A100上面,如果tp=1的话还报错么?
另一个是 (with tp > 1),
from lmdeploy import pipeline, VisionConfig pipe = pipeline(..., vision_config=VisionConfig(thread_safe=True))
1)2卡A100,只改tp=1 不会报错 2)2卡A100,tp=2,增加vision_config=VisionConfig(thread_safe=True) 不会报错,但是输出为空了 /opt/conda/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 121 CUDA SETUP: Loading binary /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so... Done copy! Result: true Done copy! Result: true /ossfs/node_45293776/workspace/autoupdate_resource/model Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo Response(text='', generate_token_len=301, input_token_len=3879, session_id=0, finish_reason='length', token_ids=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], logprobs=None) (0, 'ok', {'result': ''})
ls /usr/local/cuda/lib64/libcudart.so -lh 看下这个动态库链的是哪个版本呢?
bitsandbytes 卸载掉会好么?我怀疑是动态库的问题。
ls /usr/local/cuda/lib64/libcudart.so -lh 看下这个动态库链的是哪个版本呢?
bitsandbytes 卸载掉会好么?我怀疑是动态库的问题。
$ls /usr/local/cuda/lib64/libcudart.so -lh lrwxrwxrwx 1 root root 15 12月 15 2023 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.12 是bitsandbytes 卸载 + 增加vision_config=VisionConfig(thread_safe=True) 吗
bitsandbytes, flash_attn 都卸载吧,vision_config=VisionConfig(thread_safe=True) 加不加可以都试一下,我还没遇到这种报错。
之前有人提特征这里 segmentfault,后来重新弄了一遍环境给好了。
bitsandbytes, flash_attn 都卸载吧,vision_config=VisionConfig(thread_safe=True) 加不加可以都试一下,我还没遇到这种报错。
之前有人提特征这里 segmentfault,后来重新弄了一遍环境给好了。
卸载之后:
Warning: Flash Attention is not available, use_flash_attn is set to False.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] [WARNING] gemm_config.ingemm_config.in is not found; using default GEMM algo is not found; using default GEMM algo
Response(text='', generate_token_len=301, input_token_len=3879, session_id=0, finish_reason='length', token_ids=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], logprobs=None)
(0, 'ok', {'result': ''})
Warning: Flash Attention is not available, use_flash_attn is set to False.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] [WARNING] gemm_config.in is not found; using default GEMM algogemm_config.in is not found; using default GEMM algo
Response(text='', generate_token_len=301, input_token_len=3879, session_id=0, finish_reason='length', token_ids=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], logprobs=None)
(0, 'ok', {'result': ''})
是正常输出
所以卸载之后,即使不加 CUDA_LAUNCH_BLOCKING=1
程序也不会挂了是吧?
还可以试一下把这四行改成下面的,这样的话,vision模型就是在一个线程中跑了。(without CUDA_LAUNCH_BLOCKING=`)
outputs = self.forward(inputs)
有时间的话,或许可以试下直接用镜像会不会还有问题(直接用镜像中的python,不再额外安装内容)。https://hub.docker.com/r/openmmlab/lmdeploy
所以卸载之后,即使不加
CUDA_LAUNCH_BLOCKING=1
程序也不会挂了是吧?还可以试一下把这四行改成下面的,这样的话,vision模型就是在一个线程中跑了。(without CUDA_LAUNCH_BLOCKING=`)
outputs = self.forward(inputs)
嗯嗯 是的,程序没挂 好的好的,学习了,我试试~
所以卸载之后,即使不加
CUDA_LAUNCH_BLOCKING=1
程序也不会挂了是吧?还可以试一下把这四行改成下面的,这样的话,vision模型就是在一个线程中跑了。(without CUDA_LAUNCH_BLOCKING=`)
outputs = self.forward(inputs)
有时间的话,或许可以试下直接用镜像会不会还有问题(直接用镜像中的python,不再额外安装内容)。https://hub.docker.com/r/openmmlab/lmdeploy
改成这样,卡住了(without CUDA_LAUNCH_BLOCKING=)
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in[WARNING] gemm_config.in is not found; using default GEMM algo
is not found; using default GEMM algo
![Uploading 20240726194834.jpg…]()
改成这样,卡住了(without CUDA_LAUNCH_BLOCKING=)
卡主有加 vision_config=VisionConfig(thread_safe=True)
这个么?
感觉现在已经有点乱了,有时间先在镜像中试下吧,如果镜像没问题就是环境的原因了。
改成这样,卡住了(without CUDA_LAUNCH_BLOCKING=)
卡主有加
vision_config=VisionConfig(thread_safe=True)
这个么?感觉现在已经有点乱了,有时间先在镜像中试下吧,如果镜像没问题就是环境的原因了。
好,我先在镜像中试试吧
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.
Checklist
Describe the bug
使用4卡-Tesla V100进行模型推理,模型加载成功,推理报错 File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in
outputs = [x.cpu() for x in outputs]
RuntimeError: CUDA error: an illegal memory access was encountered
cuda12.1 + lmdeploy=0.5.1
Reproduction
pipe = pipeline("OpenGVLab__InternVL-Chat-V1-5", backend_config=TurbomindEngineConfig(tp=4, cache_max_entry_count=0.2)) gen_config = GenerationConfig(temperature=0, max_new_tokens=300) text = "简单描述一下这个图片" pipe((text, image), gen_config=gen_config)
Environment
Error traceback