InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.11k stars 280 forks source link

[Bug] Task was destroyed but it is pending! ImageEncoder._forward_loop() #1818

Closed DefTruth closed 1 week ago

DefTruth commented 1 week ago

Checklist

Describe the bug

跑InternVL-1.5 Chat遇到以下问题,在for loop中调用推理。有时会导致服务hang住 Task was destroyed but it is pending! task: <Task pending name='Task-3' coro=<ImageEncoder._forward_loop() running at /usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py:89> wait_for=>

Reproduction

none

Environment

0.4.2

Error traceback

进程退出时报错
Task was destroyed but it is pending!
task: <Task pending name='Task-3' coro=<ImageEncoder._forward_loop() running at /usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py:89> wait_for=<Future pending cb=[Task.task_wakeup()]>>
DefTruth commented 1 week ago

发现需要在脚本入口main处先初始化pipeline,不能在函数中初始化,否则会引发此错误。

def run():
     # cat not init pipeline here
     for d in data:
           res = pipe(data)     

if __name__ == '__main__':
     # init pipeline here
     pipe=pipeline(....)
     run()
irexyc commented 1 week ago

这个报错是因为 event_loop 停止了,但是这个地方还在等待。去掉Task was destroyed but it is pending!这个信息的话,可以在89行的上面加上这两行:

while self._que.qsize() == 0:
    await asyncio.sleep(0)

但是这个信息应该是在程序退出的时候才会出现,应该不会卡主。卡主是什么现象呢?

DefTruth commented 1 week ago

hang住的情况只发生了一次,后边没有复现

irexyc commented 1 week ago

@DefTruth

hang住是InternVL-1.5 Chat这个模型么,机器上有几张卡,设置了CUDA_VISIBLE_DEVICES了么?

DefTruth commented 1 week ago

@DefTruth

hang住是InternVL-1.5 Chat这个模型么,机器上有几张卡,设置了CUDA_VISIBLE_DEVICES了么?

发现不是lmdeploy的问题,closed