[Bug] lmdeploy has other questions about server for lora_merge_model

Volta-lemon commented 2 months ago

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.

Describe the bug

I promise that I will not raise two questions in one issue next time, I am very sorry.

My question is: The merged Qwen model (merged through the transformers library, the merge code is below), can be directly called through the transformers library, as shown in the screenshot below, but cannot be started directly using lmdeploy serve api_server. It should be said that it can be started on 23333, but after initiating a request, TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]].

It should be noted that:

I used this command to directly infer the unmerged model, such as Qwen2-7B-Instruct, internlm2-chat-7b, which executed very smoothly.
I used pipeline to directly load lora, and it can be used after deleting layer_replication in #2020.
Afterwards, I merged again and used lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/q7b_del --server-port 23333, but the same error was reported as before. TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
I used transformers to load the merged model and perform inference:

我保证我下次不会在一个issue里提出两个问题，非常抱歉。

我的问题是：合并后的Qwen模型（通过transformers库完成合并，合并代码在下方），可以通过transformers库进行直接调用，截图如下，但无法直接使用lmdeploy serve api_server启动，应该说可以在23333上启动，但是发起请求后会TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]。

需要注意的是：

我用该命令直接推理没有合并的模型，比如Qwen2-7B-Instruct，internlm2-chat-7b执行的很流畅。
我使用pipeline直接加载lora，在#2020，中删除layer_replication后可以使用。
之后我重新合并，使用lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/q7b_del --server-port 23333还是与之前一样报同样的错误。TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
我使用transformers加载合并后的模型可以进行推理：

Reproduction

exec： lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/q7b --server-port 23333

Calling interface code (document sample code)

from openai import OpenAI
client = OpenAI(
api_key='YOUR_API_KEY',
base_url="http://0.0.0.0:23333/v1"
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": " provide three suggestions about time management"},
],
temperature=0.8,
top_p=0.8
)
print(response)

To make an error in api-server:

INFO:     127.0.0.1:36544 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 495, in chat_completions_v1
    async for res in result_generator:
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 563, in generate
    prompt_input = await self._get_prompt_input(prompt, do_preprocess,
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 517, in _get_prompt_input
    input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 575, in encode
    return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 341, in encode
    encoded = self.model.encode(s,
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2654, in encode
    encoded_inputs = self.encode_plus(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3062, in encode_plus
    return self._encode_plus(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 583, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 511, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

run test.py：

Traceback (most recent call last):
  File "/root/tianchi/chi/test.py", line 29, in <module>
    response = client.chat.completions.create(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 643, in create
    return self._post(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 1261, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 942, in request
    return self._request(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
    return self._retry_request(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 1074, in _retry_request
    return self._request(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
    return self._retry_request(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 1074, in _retry_request
    return self._request(
  File "/root/.conda/envs/tianchi/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error

Environment

sys.platform: linux
Python: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.2, V12.2.140
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.1.2
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.16.2
LMDeploy: 0.5.0+
transformers: 4.41.2
gradio: 3.50.2
fastapi: 0.111.0
pydantic: 2.8.2
triton: 2.1.0

Error traceback

No response

Volta-lemon commented 2 months ago

Merge model code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig, TaskType, get_peft_model
model_name = 'Qwen2-7B-Instruct'
model_path = './mymodel/Qwen2-7B-Instruct'
lora_path = "./lora/"+ model_name + "_lora" + "/final"

max_new_tokens = 2048

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False, # 训练模式
    r=8, # Lora 秩
    lora_alpha=32, # Lora alaph，具体作用参见 Lora 原理
    lora_dropout=0.1# Dropout 比例
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto",torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(model, model_id=lora_path, config=config)

new_model_directory = "./mymodel/merged_model/q7b_del"
merged_model = model.merge_and_unload()
merged_model.save_pretrained(new_model_directory, max_shard_size="2048MB", safe_serialization=True)

I would also like to add that I manually copied the three files from the Qwen2-7B-Instruct file: tokenizer_config.json tokenizer.json vocab.json

lvhan028 commented 2 months ago

The exception was thrown by the tokenizer model. Can you share the related tokenizer files to us for further debugging? such as the tokenizer model, json config,

Volta-lemon commented 2 months ago

I manually copied the three files from the Qwen2-7B-Instruct file，You can get it here：https://huggingface.co/Qwen/Qwen2-7B-Instruct/tree/main

Volta-lemon commented 2 months ago

I can load it directly like this:

from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig

backend_config = PytorchEngineConfig(session_len=2048,
                                     adapters=dict(lora_name_1='/root/tianchi/chi/lora/Qwen2-7B-Instruct_lora/final'))
gen_config = GenerationConfig(top_p=0.8,
                              top_k=40,
                              temperature=0.8,
                              max_new_tokens=1024)

pipe = pipeline('/root/tianchi/chi/mymodel/Qwen2-7B-Instruct',
                backend_config=backend_config)

...
response = pipe(prompts, gen_config=gen_config, adapter_name='lora_name_1')

AllentDan commented 2 months ago

What is the result if you replace tokenizer.__call__ with tokenizer.encode in your codes?

Volta-lemon commented 2 months ago

What is the result if you replace tokenizer.__call__ with tokenizer.encode in your codes?

AllentDan commented 2 months ago

May print the value of prompt in "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 517

AllentDan commented 2 months ago

How about you try this:

from lmdeploy import Tokenizer
lmdeploy_tokenizer = Tokenizer('your model path')
out = lmdeploy_tokenizer.encode('test')
print(out)

Volta-lemon commented 2 months ago

May print the value of prompt in "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 517

I don't understand what this means

How about you try this:

from lmdeploy import Tokenizer
lmdeploy_tokenizer = Tokenizer('your model path')
out = lmdeploy_tokenizer.encode('test')
print(out)

/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [1944]

AllentDan commented 2 months ago

你没有设置对话模板。另外打印一下/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py 第 517 行前的 prompt 的值。

Volta-lemon commented 2 months ago

我该如何设置对话模版？

我加了输出

    async def _get_prompt_input(self, prompt: str, do_preprocess: bool,
                                sequence_start: bool, adapter_name: str):
        if do_preprocess:
            # use adapter's chat template if possible
            chat_template = self.chat_template
            if adapter_name in MODELS.module_dict:
                chat_template = MODELS.module_dict[adapter_name]()
            prompt = chat_template.messages2prompt(prompt, sequence_start)
        input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
        print(f"517_prompt = {prompt}")
        return {'prompt': prompt, 'input_ids': input_ids}

但是对于

lmdeploy_tokenizer = Tokenizer('your model path')
out = lmdeploy_tokenizer.encode('test')
print(out)

输出效果和之前一样

AllentDan commented 2 months ago

打印设置在 encode 之前。encode 报错了，到不了打印。

Volta-lemon commented 2 months ago

encode没报错呀，不是输出[1944]么？

Volta-lemon commented 2 months ago

还是说对于这部分代码我应该启动sever来查看输出？

    async def _get_prompt_input(self, prompt: str, do_preprocess: bool,
                                sequence_start: bool, adapter_name: str):
        if do_preprocess:
            # use adapter's chat template if possible
            chat_template = self.chat_template
            if adapter_name in MODELS.module_dict:
                chat_template = MODELS.module_dict[adapter_name]()
            prompt = chat_template.messages2prompt(prompt, sequence_start)
        print(f"517_prompt = {prompt}")
        input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
        return {'prompt': prompt, 'input_ids': input_ids}

AllentDan commented 2 months ago

是拼的 prompt 有问题，需要在 server 打印一下看看。然后对话模板你要设置一下，参考 https://lmdeploy.readthedocs.io/zh-cn/latest/advance/chat_template.html

Volta-lemon commented 2 months ago

启动server,prompt的输出为： 517_prompt = None

Traceback (most recent call last):
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 495, in chat_completions_v1
    async for res in result_generator:
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 564, in generate
    prompt_input = await self._get_prompt_input(prompt, do_preprocess,
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 518, in _get_prompt_input
    input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 575, in encode
    return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 341, in encode
    encoded = self.model.encode(s,
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2654, in encode
    encoded_inputs = self.encode_plus(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3062, in encode_plus
    return self._encode_plus(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 583, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 511, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
517_prompt = None

Volta-lemon commented 2 months ago

我再jupyter中执行实例代码

from lmdeploy.model import MODELS, BaseChatTemplate

@MODELS.register_module(name='customized_model')
class CustomizedModel(BaseChatTemplate):
    """A customized chat template."""

    def __init__(self,
                 system='<|im_start|>system\n',
                 meta_instruction='You are a robot developed by LMDeploy.',
                 user='<|im_start|>user\n',
                 assistant='<|im_start|>assistant\n',
                 eosys='<|im_end|>\n',
                 eoh='<|im_end|>\n',
                 eoa='<|im_end|>',
                 separator='\n',
                 stop_words=['<|im_end|>', '<|action_end|>']):
        super().__init__(system=system,
                         meta_instruction=meta_instruction,
                         eosys=eosys,
                         user=user,
                         eoh=eoh,
                         assistant=assistant,
                         eoa=eoa,
                         separator=separator,
                         stop_words=stop_words)

from lmdeploy import ChatTemplateConfig, pipeline

messages = [{'role': 'user', 'content': 'who are you?'}]
pipe = pipeline('./mymodel/merged_model/q7b_del',
                chat_template_config=ChatTemplateConfig('customized_model'))
for response in pipe.stream_infer(messages):
    print(response.text, end='')

输出为：

[WARNING] gemm_config.in is not found; using default GEMM algo
Exception in thread Thread-136 (<lambda>):
Traceback (most recent call last):
  File "/root/.conda/envs/lmdeploy/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 495, in <lambda>
    proc = Thread(target=lambda: loop.run_until_complete(gather()))
  File "/root/.conda/envs/lmdeploy/lib/python3.10/asyncio/base_events.py", line 625, in run_until_complete
    self._check_running()
  File "/root/.conda/envs/lmdeploy/lib/python3.10/asyncio/base_events.py", line 584, in _check_running
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
/root/.conda/envs/lmdeploy/lib/python3.10/threading.py:1018: RuntimeWarning: coroutine 'AsyncEngine.stream_infer.<locals>.gather' was never awaited
  self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

并且这个cell执行了半小时左右还没完

AllentDan commented 2 months ago

有些用户的 jupyter notebook 需要安装 nest_asyncio。你的报错就是因为对话模板不对，你自己设置一个对话模板吧。考虑到你之前 merge weight 前可以跑，应该就是 lmdeploy 已有的对话模板。你找一下，通过--model-name 传进去就行了。

Volta-lemon commented 2 months ago

在社区里了解到对话模版根据模型文件名自己匹配，修改成原始名字可以解决了，好像--model-name也行但是没准备试了，之后有问题再说吧。

修改方式：即： lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/q7b_del --server-port 23333 lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/Qwen2-7B-Instruct --server-port 23333

开发者说之后应该会做自动匹配的东西，或许下个版本就不用纠结这个问题了~

datalee commented 3 weeks ago

哎呀，确实是这个问题，怪我手贱

InternLM / lmdeploy