InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.52k stars 408 forks source link

pipeline不能放一个自己微调过的本地模型路径吗? #1081

Closed Crystalxd closed 9 months ago

Crystalxd commented 9 months ago

如题。 报错行:

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000)
model_path = "/output/inter2-20b-sft/checkpoint-1000/"
pipe = pipeline(model_path, backend_config=backend_config)

报错信息:

model_source: hf_model
Please input a model_name for hf_model
Traceback (most recent call last):
  File "/server/test_long.py", line 5, in <module>
    pipe = pipeline(model_path, backend_config=backend_config)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/api.py", line 61, in pipeline
    return AsyncEngine(model_path,
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 67, in __init__
    self._build_turbomind(model_path=model_path,
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 108, in _build_turbomind
    self.engine = tm.TurboMind.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 426, in from_pretrained
    return cls(model_path=local_path,
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 182, in __init__
    self.model_comm = self._from_hf(model_source=model_source,
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 265, in _from_hf
    assert engine_config.model_name in MODELS.module_dict.keys(), \
AssertionError: 'None' is not supported. The supported models are: dict_keys(['base', 'llama', 'internlm', 'vicuna', 'wizardlm', 'internlm-chat-7b', 'internlm-chat', 'internlm-chat-7b-8k', 'internlm-chat-20b', 'internlm-20b', 'internlm2-7b', 'internlm2-20b', 'internlm2-chat-7b', 'internlm2-chat-20b', 'baichuan-7b', 'baichuan2-7b', 'puyu', 'llama2', 'llama-2', 'llama-2-chat', 'qwen-7b', 'qwen-14b', 'codellama', 'falcon', 'chatglm2-6b', 'solar', 'solar-70b', 'ultralm', 'ultracm', 'yi', 'yi-chat', 'yi-200k', 'yi-34b'])
irexyc commented 9 months ago

backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000, model_name='internlm2-chat-7b')

暂时先这样,下个版本会把这里的逻辑修一下。

Crystalxd commented 9 months ago

好的,感谢回复,已解决。

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

model_path = "/output/inter2-20b-sft/checkpoint-1000/"
# backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000)
backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000, model_name='internlm2-chat-20b', cache_max_entry_count=0.2)

pipe = pipeline(model_path=model_path, backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
gen_config = GenerationConfig(top_p=0.8,
                              top_k=40,
                              temperature=0.8,
                              max_new_tokens=1024)
response = pipe(prompt, gen_config=gen_config)
print(response)

输出:

model_source: hf_model
model_config:

[llama]
model_name = internlm2-chat-20b
tensor_para_size = 1
head_num = 48
kv_head_num = 8
vocab_size = 92544
num_layer = 48
inter_size = 16384
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 160000
weight_type = bf16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.2
cache_block_seq_len = 128
cache_chunk_size = -1
num_tokens_per_iter = 0
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
rope_scaling_factor = 2.0
use_logn_attn = 0

get 387 model params
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                                                                               
Response(text='This is a placeholder sentence that can be replaced with a longer prompt.', generate_token_len=14, finish_reason='stop')