Closed Volta-lemon closed 2 months ago
Merge model code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig, TaskType, get_peft_model
model_name = 'Qwen2-7B-Instruct'
model_path = './mymodel/Qwen2-7B-Instruct'
lora_path = "./lora/"+ model_name + "_lora" + "/final"
max_new_tokens = 2048
config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
inference_mode=False, # 训练模式
r=8, # Lora 秩
lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理
lora_dropout=0.1# Dropout 比例
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto",torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(model, model_id=lora_path, config=config)
new_model_directory = "./mymodel/merged_model/q7b_del"
merged_model = model.merge_and_unload()
merged_model.save_pretrained(new_model_directory, max_shard_size="2048MB", safe_serialization=True)
I would also like to add that I manually copied the three files from the Qwen2-7B-Instruct file: tokenizer_config.json tokenizer.json vocab.json
The exception was thrown by the tokenizer model. Can you share the related tokenizer files to us for further debugging? such as the tokenizer model, json config,
I manually copied the three files from the Qwen2-7B-Instruct file,You can get it here:https://huggingface.co/Qwen/Qwen2-7B-Instruct/tree/main
I can load it directly like this:
from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig
backend_config = PytorchEngineConfig(session_len=2048,
adapters=dict(lora_name_1='/root/tianchi/chi/lora/Qwen2-7B-Instruct_lora/final'))
gen_config = GenerationConfig(top_p=0.8,
top_k=40,
temperature=0.8,
max_new_tokens=1024)
pipe = pipeline('/root/tianchi/chi/mymodel/Qwen2-7B-Instruct',
backend_config=backend_config)
...
response = pipe(prompts, gen_config=gen_config, adapter_name='lora_name_1')
What is the result if you replace tokenizer.__call__
with tokenizer.encode
in your codes?
What is the result if you replace
tokenizer.__call__
withtokenizer.encode
in your codes?
May print the value of prompt in "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 517
How about you try this:
from lmdeploy import Tokenizer
lmdeploy_tokenizer = Tokenizer('your model path')
out = lmdeploy_tokenizer.encode('test')
print(out)
May print the value of prompt in "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 517
I don't understand what this means
How about you try this:
from lmdeploy import Tokenizer lmdeploy_tokenizer = Tokenizer('your model path') out = lmdeploy_tokenizer.encode('test') print(out)
/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [1944]
你没有设置对话模板。另外打印一下/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py 第 517 行前的 prompt 的值。
我该如何设置对话模版?
我加了输出
async def _get_prompt_input(self, prompt: str, do_preprocess: bool,
sequence_start: bool, adapter_name: str):
if do_preprocess:
# use adapter's chat template if possible
chat_template = self.chat_template
if adapter_name in MODELS.module_dict:
chat_template = MODELS.module_dict[adapter_name]()
prompt = chat_template.messages2prompt(prompt, sequence_start)
input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
print(f"517_prompt = {prompt}")
return {'prompt': prompt, 'input_ids': input_ids}
但是对于
lmdeploy_tokenizer = Tokenizer('your model path')
out = lmdeploy_tokenizer.encode('test')
print(out)
输出效果和之前一样
打印设置在 encode 之前。encode 报错了,到不了打印。
encode没报错呀,不是输出[1944]么?
还是说对于这部分代码我应该启动sever来查看输出?
async def _get_prompt_input(self, prompt: str, do_preprocess: bool,
sequence_start: bool, adapter_name: str):
if do_preprocess:
# use adapter's chat template if possible
chat_template = self.chat_template
if adapter_name in MODELS.module_dict:
chat_template = MODELS.module_dict[adapter_name]()
prompt = chat_template.messages2prompt(prompt, sequence_start)
print(f"517_prompt = {prompt}")
input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
return {'prompt': prompt, 'input_ids': input_ids}
是拼的 prompt 有问题,需要在 server 打印一下看看。然后对话模板你要设置一下,参考 https://lmdeploy.readthedocs.io/zh-cn/latest/advance/chat_template.html
启动server,prompt的输出为: 517_prompt = None
Traceback (most recent call last):
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 495, in chat_completions_v1
async for res in result_generator:
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 564, in generate
prompt_input = await self._get_prompt_input(prompt, do_preprocess,
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 518, in _get_prompt_input
input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 575, in encode
return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 341, in encode
encoded = self.model.encode(s,
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2654, in encode
encoded_inputs = self.encode_plus(
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3062, in encode_plus
return self._encode_plus(
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 583, in _encode_plus
batched_output = self._batch_encode_plus(
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 511, in _batch_encode_plus
encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
517_prompt = None
我再jupyter中执行实例代码
from lmdeploy.model import MODELS, BaseChatTemplate
@MODELS.register_module(name='customized_model')
class CustomizedModel(BaseChatTemplate):
"""A customized chat template."""
def __init__(self,
system='<|im_start|>system\n',
meta_instruction='You are a robot developed by LMDeploy.',
user='<|im_start|>user\n',
assistant='<|im_start|>assistant\n',
eosys='<|im_end|>\n',
eoh='<|im_end|>\n',
eoa='<|im_end|>',
separator='\n',
stop_words=['<|im_end|>', '<|action_end|>']):
super().__init__(system=system,
meta_instruction=meta_instruction,
eosys=eosys,
user=user,
eoh=eoh,
assistant=assistant,
eoa=eoa,
separator=separator,
stop_words=stop_words)
from lmdeploy import ChatTemplateConfig, pipeline
messages = [{'role': 'user', 'content': 'who are you?'}]
pipe = pipeline('./mymodel/merged_model/q7b_del',
chat_template_config=ChatTemplateConfig('customized_model'))
for response in pipe.stream_infer(messages):
print(response.text, end='')
输出为:
[WARNING] gemm_config.in is not found; using default GEMM algo
Exception in thread Thread-136 (<lambda>):
Traceback (most recent call last):
File "/root/.conda/envs/lmdeploy/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
_threading_Thread_run(self)
File "/root/.conda/envs/lmdeploy/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 495, in <lambda>
proc = Thread(target=lambda: loop.run_until_complete(gather()))
File "/root/.conda/envs/lmdeploy/lib/python3.10/asyncio/base_events.py", line 625, in run_until_complete
self._check_running()
File "/root/.conda/envs/lmdeploy/lib/python3.10/asyncio/base_events.py", line 584, in _check_running
raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
/root/.conda/envs/lmdeploy/lib/python3.10/threading.py:1018: RuntimeWarning: coroutine 'AsyncEngine.stream_infer.<locals>.gather' was never awaited
self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
并且这个cell执行了半小时左右还没完
有些用户的 jupyter notebook 需要安装 nest_asyncio。你的报错就是因为对话模板不对,你自己设置一个对话模板吧。考虑到你之前 merge weight 前可以跑,应该就是 lmdeploy 已有的对话模板。你找一下,通过--model-name 传进去就行了。
在社区里了解到对话模版根据模型文件名自己匹配,修改成原始名字可以解决了,好像--model-name也行但是没准备试了,之后有问题再说吧。
修改方式: 即: lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/q7b_del --server-port 23333 lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/Qwen2-7B-Instruct --server-port 23333
开发者说之后应该会做自动匹配的东西,或许下个版本就不用纠结这个问题了~
哎呀,确实是这个问题,怪我手贱
Checklist
Describe the bug
I promise that I will not raise two questions in one issue next time, I am very sorry.
My question is: The merged Qwen model (merged through the transformers library, the merge code is below), can be directly called through the transformers library, as shown in the screenshot below, but cannot be started directly using lmdeploy serve api_server. It should be said that it can be started on 23333, but after initiating a request, TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]].
It should be noted that:
layer_replication
in #2020.我保证我下次不会在一个issue里提出两个问题,非常抱歉。
我的问题是: 合并后的Qwen模型(通过transformers库完成合并,合并代码在下方),可以通过transformers库进行直接调用,截图如下,但无法直接使用lmdeploy serve api_server启动,应该说可以在23333上启动,但是发起请求后会TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]。
需要注意的是:
layer_replication
后可以使用。Reproduction
exec:
lmdeploy serve api_server /root/tianchi/chi/mymodel/merged_model/q7b --server-port 23333
Calling interface code (document sample code)
To make an error in api-server:
Environment
Error traceback
No response