[Bug] lmdeploy pytorch backend slora cannot use TP=2

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

4.23k stars 381 forks source link

[Bug] lmdeploy pytorch backend slora cannot use TP=2 #1512

Closed jjjjohnson closed 4 months ago

jjjjohnson commented 4 months ago

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.

Describe the bug

Looks like in Pytorch backend Slora and TP cannot be turned on at the same time...

Reproduction

lmdeploy serve api_server \
    /path/to/qwen14bchat \
    --model-name qwen-14b \
    --server-port 23333 \
    --backend pytorch \
    --cache-max-entry-count 0.95 \
    --max-batch-size 256 --tp 2 \
    --adapters adap1=path2adap1

Environment

lmdeploy 0.3.0

Error traceback

No response

grimoire commented 4 months ago

We have not support s-lora TP on ALL model since that different model have different qkv linear projection. If you can provide the adapter, we might be able to add the support you need.

jjjjohnson commented 4 months ago

Hi @grimoire Can you show which model type currently support s-lora TP? I will try to solve the issue and PR.

jjjjohnson commented 4 months ago

Qwen14b chat:

Qwen14b chat with lora:

And qwen.py mod.register_parameter(name, dist_param) is not able to deal with nested c_attn

grimoire commented 4 months ago

All models with seprated q,k,v proj support tp. chatglm2 has similar pattern like qwen

https://github.com/InternLM/lmdeploy/blob/89fc8504d537f8b02cb368dcd808438f55f13b55/lmdeploy/pytorch/models/chatglm2.py#L124-L137

jjjjohnson commented 4 months ago

Thanks @grimoire Qwen has a combined q,k,v proj called c_attn, Looks like Baichuan is also very similar to Qwen.