Open hitzhu opened 3 months ago
用tp了么?
启动程序之前,export CUDA_LAUNCH_BLOCKING=1
,先设置环境变量,然后再跑的结果如何呢?
用tp了么?
启动程序之前,
export CUDA_LAUNCH_BLOCKING=1
,先设置环境变量,然后再跑的结果如何呢?
用了tp=4,A100,不用的话模型放不下,加了之后还是一样的错误
创建pipeline / server的时候,cache_max_entry_count 设成0.1来减少kvcache的用量试试看的。vision的部分复用的上游的代码,感觉出问题的概率不太大,这里怀疑可能是显存不足导致的,模型启动后的剩余显存有多少呢。
创建pipeline / server的时候,cache_max_entry_count 设成0.1来减少kvcache的用量试试看的。vision的部分复用的上游的代码,感觉出问题的概率不太大,这里怀疑可能是显存不足导致的,模型启动后的剩余显存有多少呢。
已解决,4张A100 tp==4出错,但是2张tp=2可以
我觉得不算解决,并不清楚原因是什么
我觉得不算解决,并不清楚原因是什么
会不会是tp数不同,模型split策略不同导致的
感觉不是,方便的话,可以试下在这个镜像里面会不会报错。 https://hub.docker.com/r/openmmlab/lmdeploy/tags
我遇到了同样的问题,单机一张3090一张2080ti 22g。以下是环境信息 sys.platform: linux Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda-12.1 NVCC: Cuda compilation tools, release 12.1, V12.1.66 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 2.2.2+cu121 PyTorch compiling details: PyTorch built with:
TorchVision: 0.17.2+cu121 LMDeploy: 0.5.3+9f3e748 transformers: 4.42.4 gradio: 3.50.2 fastapi: 0.111.1 pydantic: 2.8.2 triton: 2.2.0 NVIDIA Topology: GPU0 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X N/A
Legend:
X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks
同样的问题,会偶发在这段卡主:
File "/root/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 155, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
trace到是在accelerate send_to_device函数没有返回
Checklist
Describe the bug
ERROR:asyncio:Exception in callback _raise_exception_on_finish(<Future finis...sertions.\n')>) at /root/.local/lib/python3.10/site-packages/lmdeploy/vl/engine.py:19 handle: <Handle _raise_exception_on_finish(<Future finis...sertions.\n')>) at /root/.local/lib/python3.10/site-packages/lmdeploy/vl/engine.py:19> Traceback (most recent call last): File "/opt/conda/envs/python3.10.13/lib/python3.10/asyncio/events.py", line 80, in _run self._context.run(self._callback, self._args) File "/root/.local/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 26, in _raise_exception_on_finish raise e File "/root/.local/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 22, in _raise_exception_on_finish task.result() File "/opt/conda/envs/python3.10.13/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception.with_traceback(self._exception_tb) File "/opt/conda/envs/python3.10.13/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/root/.local/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 151, in forward outputs = self.model.forward(inputs) File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/root/.local/lib/python3.10/site-packages/lmdeploy/vl/model/internvl.py", line 172, in forward return self._forward_func(images) File "/root/.local/lib/python3.10/site-packages/lmdeploy/vl/model/internvl.py", line 153, in _forward_v1_5 outputs = self.model.extract_feature(outputs) File "/root/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_internvl_chat.py", line 176, in extract_feature vit_embeds = self.vision_model( File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_intern_vit.py", line 418, in forward encoder_outputs = self.encoder( File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_intern_vit.py", line 354, in forward layer_outputs = encoder_layer( File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/root/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward args, kwargs = module._hf_hook.pre_forward(module, args, kwargs) File "/root/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 363, in pre_forward return send_to_device(args, self.execution_device), send_to_device( File "/root/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 174, in send_to_device return honor_type( File "/root/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 81, in honor_type return type(obj)(generator) File "/root/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 175, in
tensor, (send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) for t in tensor)
File "/root/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 155, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Reproduction
rt
Environment
Error traceback