fix qwen-vl-chat hung - Githubissues

Motivation

from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('/nvme/shared/Qwen-VL-Chat/', log_level='INFO')
im = load_image('tiger.jpeg')
pipe.vl_encoder.forward([im])

  File "/home/chenxin/miniconda3/envs/38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chenxin/.cache/huggingface/modules/transformers_modules/visual.py", line 149, in forward
    self._repeat(q, N) + self.pos_embed.unsqueeze(1),
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:6!

https://huggingface.co/Qwen/Qwen-VL-Chat/blob/main/visual.py#L148-L152

InternLM / lmdeploy

fix qwen-vl-chat hung #1824

Motivation