Open hezeli123 opened 4 months ago
期间没有发送stop的请求
--log-level INFO
,然后服务端的日志是怎样的呢?
日志如下,有收到请求后的图像下载信息,后续没有LLM推理相关的日志。
2024-07-11 19:59:48,123 - lmdeploy - [37mINFO[0m - async_collect_pil_images latency: 98.4154 ms
2024-07-11 19:59:48,123 - lmdeploy - [37mINFO[0m - ImageEncoder received 1 images, left 1 images.
2024-07-11 19:59:48,123 - lmdeploy - [37mINFO[0m - ImageEncoder process 1 images, left 0 images.
2024-07-11 19:59:48,185 - lmdeploy - [37mINFO[0m - ImageEncoder forward 1 images, cost 0.061s
2024-07-11 19:59:48,185 - lmdeploy - [37mINFO[0m - ImageEncoder done 1 images, left 0 images.
2024-07-11 19:59:48,222 - lmdeploy - [37mINFO[0m - prompt='<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nPicture 0:
Same problem here with internvl2 8B model, the server does not response after certain running time.
the error msg (not sure if related to this issue or not):
2024-08-05 09:37:02,625 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-08-05 09:37:02,625 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
Exception in callback _raise_exception_on_finish(<Future finis...is WMF file')>) at /usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py:19
handle: <Handle _raise_exception_on_finish(<Future finis...is WMF file')>) at /usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py:19>
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py", line 26, in _raise_exception_on_finish
raise e
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py", line 22, in _raise_exception_on_finish
task.result()
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py", line 151, in forward
outputs = self.model.forward(inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/model/internvl.py", line 171, in forward
images = [x.convert('RGB') for x in images]
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/model/internvl.py", line 171, in <listcomp>
images = [x.convert('RGB') for x in images]
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 941, in convert
self.load()
File "/usr/local/lib/python3.10/dist-packages/PIL/WmfImagePlugin.py", line 161, in load
return super().load()
File "/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py", line 366, in load
raise OSError(msg)
请问是否解决,我用lmdeploy推理LLM遇到了相似的问题...
@wxsms 你的问题应该是图片 convert('RGB') 的时候报错了,原因应该是图片损坏,这一块lmdeploy内部目前并没有异常处理。
可以用lmdeploy处理之前,先看下图片convert能不能正常处理。
INFO的日志不是能很好的反映问题。最好的方式是首先设置环境变量 export TM_DEBUG_LEVEL=DEBUG
,这样会自动插入cuda的同步函数。另外启动服务的时候需要设置 --log-level=DEBUG
,这样会打印debug的日志。
我也遇到了相同的问题,使用lmdeploy v0.5.0 运行 InternVL-v15-chat 个人怀疑是:线程中异步的ImageEncoder前向 和 llm中的 forward导致的 cuda Launch 卡死。
try the latest release
用了0.5.3版本尝试 TP=2 推理 InternVL-v15-chat features 这里返回是Nan,最终的输出logits 都是0,context无输出。TP=1 没有这个问题,切输出结果正常。
@coolhok
使用pipeline接口,创建完pipeline之后,(with tp > 1),直接调用下面的语句会有问题么?
from lmdeploy.vl import load_image
im = load_image('image path')
pipe.vl_encoder.forward([im])
@coolhok
使用pipeline接口,创建完pipeline之后,(with tp > 1),直接调用下面的语句会有问题么?
from lmdeploy.vl import load_image im = load_image('image path') pipe.vl_encoder.forward([im])
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy import pipeline, TurbomindEngineConfig
import asyncio
pipe = pipeline('/mnt/workspace/model_hub/InternVL-Chat-V1-5/',
backend_config=TurbomindEngineConfig(
tp=2, cache_max_entry_count=0.5))
im = load_image('./img/1.jpg')
r_sync = pipe.vl_encoder.forward([im])
print(f"r_sync = {r_sync}")
r_async = asyncio.run(pipe.vl_encoder.async_infer([im]))
print(f"r_sync = {r_async}")
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
r_sync = [tensor([[ 0.3127, 0.2219, -0.0070, ..., 0.0645, 0.0034, -0.4839],
[ 0.2047, 0.2479, -0.0424, ..., -0.0375, -0.0715, -0.3203],
[ 0.2976, 0.3047, -0.0497, ..., -0.2800, 0.2542, 0.4961],
...,
[ 0.0272, 0.1045, 0.4470, ..., 0.2690, 0.3364, -0.7417],
[ 0.0319, 0.0803, 0.4099, ..., 0.2659, 0.3633, -0.7271],
[ 0.0234, 0.0804, 0.4424, ..., 0.3066, 0.3467, -0.7461]],
dtype=torch.float16)]
r_sync = [tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=torch.float16)]
@irexyc use lmdeploy v0.5.3。TP=1 Running the generated data may also result in HTTP 499 freezing error,Simultaneous error stack
_forward_loop :Add more threads to save operations?
[2024-08-10 19:25:23] 2024-08-10 19:25:23,109 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
[2024-08-10 19:25:23] 2024-08-10 19:25:23,109 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
[2024-08-10 19:25:23] ERROR:asyncio:Exception in callback _raise_exception_on_finish(<Future finis... processed)')>) at /usr/local/lib/python3.10/site-packages/lmdeploy/vl/engine.py:19
[2024-08-10 19:25:23] handle: <Handle _raise_exception_on_finish(<Future finis... processed)')>) at /usr/local/lib/python3.10/site-packages/lmdeploy/vl/engine.py:19>
[2024-08-10 19:25:23] Traceback (most recent call last):
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/asyncio/events.py", line 80, in _run
[2024-08-10 19:25:23] self._context.run(self._callback, *self._args)
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 26, in _raise_exception_on_finish
[2024-08-10 19:25:23] raise e
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 22, in _raise_exception_on_finish
[2024-08-10 19:25:23] task.result()
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
[2024-08-10 19:25:23] result = self.fn(*self.args, **self.kwargs)
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 151, in forward
[2024-08-10 19:25:23] outputs = self.model.forward(inputs)
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[2024-08-10 19:25:23] return func(*args, **kwargs)
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/lmdeploy/vl/model/internvl.py", line 171, in forward
[2024-08-10 19:25:23] images = [x.convert('RGB') for x in images]
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/lmdeploy/vl/model/internvl.py", line 171, in <listcomp>
[2024-08-10 19:25:23] images = [x.convert('RGB') for x in images]
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/PIL/Image.py", line 916, in convert
[2024-08-10 19:25:23] self.load()
[2024-08-10 19:25:23] File "/usr/local/lib/python3.10/site-packages/PIL/ImageFile.py", line 266, in load
[2024-08-10 19:25:23] raise OSError(msg)
[2024-08-10 19:25:23] OSError: image file is truncated (43 bytes not processed)
Checklist
Describe the bug
单卡A100上跑qwenvl-chat模型,使用32k的session-len跑一段时间请求后,服务失去响应。
Reproduction
1、lmdeploy serve api_server Qwen-VL-Chat --server-port 80 --session-len 32768 2、使用client跑一段时间后,服务失去响应,client端如果使用同步,没有设置超时时间的话,client一直没有响应返回
Environment
Error traceback