Open suwenzhuo opened 2 weeks ago
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention_norm.weight
2024-09-02 10:29:26,825 - lmdeploy - WARNING - get 227 model params
2024-09-02 10:29:43,154 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=8, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[TM][DEBUG] Set logger level by DEBUG
[TM][DEBUG] turbomind::Allocatorsession_len
(131080), session_len
truncated to 28480.
[TM][DEBUG] void turbomind::LlamaBatch
Checklist
Describe the bug
环境: A100 mldeploy==0.5.3 模型Llama3.1-8B-Chinese-Chat
Aborted (core dumped)
Reproduction
我在运行下面的代码时出现问题 from lmdeploy import pipeline, TurbomindEngineConfig engine_config = TurbomindEngineConfig(quant_policy=8) pipe = pipeline("/root/model/Llama3.1-8B-Chinese-Chat", backend_config=engine_config, log_level='DEBUG') response = pipe(["Hi, pls intro yourself", "Shanghai is"]) print(response)
Environment
Error traceback