InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.11k stars 280 forks source link

fix qwen-vl-chat hung #1824

Closed irexyc closed 1 week ago

irexyc commented 1 week ago

Motivation

from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('/nvme/shared/Qwen-VL-Chat/', log_level='INFO')
im = load_image('tiger.jpeg')
pipe.vl_encoder.forward([im])
  File "/home/chenxin/miniconda3/envs/38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chenxin/.cache/huggingface/modules/transformers_modules/visual.py", line 149, in forward
    self._repeat(q, N) + self.pos_embed.unsqueeze(1),
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:6!

https://huggingface.co/Qwen/Qwen-VL-Chat/blob/main/visual.py#L148-L152