Closed Crystalxd closed 9 months ago
backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000, model_name='internlm2-chat-7b')
暂时先这样,下个版本会把这里的逻辑修一下。
好的,感谢回复,已解决。
from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig
model_path = "/output/inter2-20b-sft/checkpoint-1000/"
# backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000)
backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=160000, model_name='internlm2-chat-20b', cache_max_entry_count=0.2)
pipe = pipeline(model_path=model_path, backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
gen_config = GenerationConfig(top_p=0.8,
top_k=40,
temperature=0.8,
max_new_tokens=1024)
response = pipe(prompt, gen_config=gen_config)
print(response)
输出:
model_source: hf_model
model_config:
[llama]
model_name = internlm2-chat-20b
tensor_para_size = 1
head_num = 48
kv_head_num = 8
vocab_size = 92544
num_layer = 48
inter_size = 16384
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 160000
weight_type = bf16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.2
cache_block_seq_len = 128
cache_chunk_size = -1
num_tokens_per_iter = 0
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
rope_scaling_factor = 2.0
use_logn_attn = 0
get 387 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
Response(text='This is a placeholder sentence that can be replaced with a longer prompt.', generate_token_len=14, finish_reason='stop')
如题。 报错行:
报错信息: