[Bug] Aborted (core dumped)

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

环境： A100 mldeploy==0.5.3 模型Llama3.1-8B-Chinese-Chat

Aborted (core dumped)

Reproduction

我在运行下面的代码时出现问题 from lmdeploy import pipeline, TurbomindEngineConfig engine_config = TurbomindEngineConfig(quant_policy=8) pipe = pipeline("/root/model/Llama3.1-8B-Chinese-Chat", backend_config=engine_config, log_level='DEBUG') response = pipe(["Hi, pls intro yourself", "Shanghai is"]) print(response)

Environment

A100
mldeploy==0.5.3
模型Llama3.1-8B-Chinese-Chat

Error traceback

问题：
2024-09-02 10:29:22,080 - lmdeploy - INFO - Using turbomind engine
2024-09-02 10:29:22,080 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=8, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-09-02 10:29:22,080 - lmdeploy - INFO - input chat_template_config=None
2024-09-02 10:29:24,541 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='llama3_1', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-02 10:29:24,541 - lmdeploy - INFO - model_source: hf_model
2024-09-02 10:29:26,787 - lmdeploy - INFO - model_config:

[llama]
model_name = llama3_1
model_arch = LlamaForCausalLM
tensor_para_size = 1
head_num = 32
kv_head_num = 8
vocab_size = 128256
num_layer = 32
inter_size = 14336
norm_eps = 1e-05
attn_bias = 0
start_id = 128000
end_id = 128009
session_len = 131080
weight_type = bf16
rotary_embedding = 128
rope_theta = 500000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_prefill_token_num = 8192
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.8
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 17
use_context_fmha = 1
quant_policy = 8
max_position_embeddings = 131072
original_max_position_embeddings = 8192
rope_scaling_type = llama3
rope_scaling_factor = 8.0
use_dynamic_ntk = 0
low_freq_factor = 1.0
high_freq_factor = 4.0
use_logn_attn = 0
lora_policy = 
lora_r = 0
lora_scale = 0.0
lora_max_wo_r = 0
lora_rank_pattern = 
lora_scale_pattern = 

[TM][DEBUG] Set logger level by DEBUG
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 131080.
[TM][INFO] Model: 
head_num: 32
kv_head_num: 8
size_per_head: 128
inter_size: 14336
num_layer: 32
vocab_size: 128256
attn_bias: 0
max_batch_size: 128
max_prefill_token_num: 8192
max_context_token_num: 131080
session_len: 131080
step_length: 1
cache_max_entry_count: 0.8
cache_block_seq_len: 64
cache_chunk_size: -1
enable_prefix_caching: 0
use_context_fmha: 1
start_id: 128000
tensor_para_size: 1
pipeline_para_size: 1
enable_custom_all_reduce: 0
model_name: llama3_1
model_dir: 
quant_policy: 8
group_size: 0

[TM][DEBUG] Set logger level by DEBUG
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: tok_embeddings.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.0.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.1.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.2.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.3.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.4.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.5.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.6.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.7.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.8.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.9.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.10.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.11.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.12.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.13.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.14.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.15.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.16.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.17.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.18.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.19.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.20.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.21.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.22.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.23.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.24.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.25.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.26.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.27.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.28.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.ffn_norm.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w2.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w3.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w1.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention.wo.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention.w_qkv.0.weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.ffn_norm.weight

[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.feed_forward.w2.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.29.attention_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.ffn_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.w_qkv.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.wo.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w1.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w3.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w2.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w2.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w1.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.wo.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention.w_qkv.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.ffn_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.feed_forward.w3.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.30.attention_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.ffn_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.w_qkv.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.wo.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w1.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w3.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w2.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w2.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w3.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.feed_forward.w1.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.wo.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention.w_qkv.0.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.ffn_norm.weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layers.31.attention_norm.weight 2024-09-02 10:29:26,825 - lmdeploy - WARNING - get 227 model params 2024-09-02 10:29:43,154 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=8, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [WARNING] gemm_config.in is not found; using default GEMM algo [TM][DEBUG] turbomind::cublasMMWrapper::cublasMMWrapper(cublasHandle_t, cublasLtHandle_t, cudaStream_t, turbomind::cublasAlgoMap, std::mutex, turbomind::IAllocator) [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = void; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x302000000 with size 33554432 [TM][DEBUG] turbomind::LlamaV2::LlamaV2(size_t, size_t, size_t, size_t, size_t, size_t, float, const turbomind::LlamaAttentionParams&, int, int, int, int, bool, const turbomind::EngineParams&, const turbomind::LoraParams&, std::shared_ptr<turbomind::LlamaV2::SharedState>, turbomind::LlamaWeight, turbomind::NcclParam, cudaStream_t, turbomind::cublasMMWrapper, turbomind::IAllocator, turbomind::IAllocator, bool, cudaDeviceProp) [with T = __nv_bfloat16; size_t = long unsigned int; cudaStream_t = CUstream_st] [TM][INFO] NCCL group_id = 0 [TM][INFO] [BlockManager] block_size = 4 MB [TM][INFO] [BlockManager] max_block_count = 445 [TM][INFO] [BlockManager] chunk_size = 445 [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x304000000 with size 1924792320 [TM][WARNING] No enough blocks for session_len (131080), session_len truncated to 28480. [TM][DEBUG] void turbomind::LlamaBatch::AllocateBuffer(size_t, size_t, int) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x12e2000000 with size 68157440 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x376ba0000 with size 68157440 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37aca0000 with size 33280 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37aca8200 with size 1048576 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37ada8200 with size 1048576 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37aea8200 with size 14581760 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bc90200 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bc90400 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bc90600 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bc90800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bc90a00 with size 544 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = long unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bc90e00 with size 455712 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37bd00400 with size 65667072 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x12e6100000 with size 65667072 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37fba0400 with size 524288 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37fc20400 with size 524288 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37fca0400 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x37fca0600 with size 29163520 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = bool; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381870600 with size 128 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381870800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381870a00 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381870c00 with size 32768 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381878c00 with size 32768 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d000000 with size 32768 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d008000 with size 32768 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010000 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010200 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010400 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010600 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = long long unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010a00 with size 1024 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = long long unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381880c00 with size 1024 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d010e00 with size 6144 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381881000 with size 6144 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381882800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d012600 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x381882a00 with size 14581760 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x38266aa00 with size 6144 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x38266c200 with size 14581760 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x383454200 with size 6144 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x383455a00 with size 14581760 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x38423da00 with size 6144 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d200000 with size 14581760 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d012800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d012a00 with size 544 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = long unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d012e00 with size 455680 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082200 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082400 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = bool; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082600 with size 256 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082a00 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082c00 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = bool; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d082e00 with size 256 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083000 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083200 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083400 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = bool; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083600 with size 256 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083800 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083a00 with size 512 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf85000000 with size 14581760 [TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) start [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = float; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d083c00 with size 524288 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d103c00 with size 524288 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d183c00 with size 512 [TM][DEBUG] void turbomind::LlamaV2::initialize(const turbomind::LlamaAttentionParams&, size_t, bool, int, int) [with T = nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] void turbomind::UnifiedAttentionLayer::allocateWorkspace() [with T = nv_bfloat16] [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x38423f200 with size 524288 [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x3842bf200 with size 524288 [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x38433f200 with size 67108864 [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x38833f200 with size 16384 [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x388343200 with size 524288 [TM][DEBUG] turbomind::DynamicDecodeLayer::DynamicDecodeLayer(size_t, size_t, int, cudaStream_t, turbomind::cublasMMWrapper, turbomind::IAllocator, bool, cudaDeviceProp) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st] [TM][DEBUG] void turbomind::DynamicDecodeLayer::initialize() [with T = float] [TM][DEBUG] void turbomind::DynamicDecodeLayer::allocateBuffer() [with T = float] [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d183e00 with size 32 [TM][DEBUG] void turbomind::UnifiedDecoder::allocateBuffer(size_t) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x3883c3200 with size 1056 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = int; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void* turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x7fdf7d184000 with size 1056 [TM][INFO] LlamaBatch::Start() [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] turbomind::Allocator::Allocator(int, bool) 2024-09-02 10:29:44,283 - lmdeploy - INFO - prompt='<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHi, pls intro yourself<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=2462795913259709772, stop_words=[128009, 128001, 128008], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 13347, 11, 87705, 20285, 6261, 128009, 128006, 78191, 128007, 271], adapter_name=None. 2024-09-02 10:29:44,283 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=26, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True 2024-09-02 10:29:44,283 - lmdeploy - INFO - Register stream callback for 0 [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] std::shared_ptr<std::unordered_map<std::basic_string, triton::Tensor> > LlamaTritonModelInstance::forward(std::shared_ptr<std::unordered_map<std::basic_string, triton::Tensor> >, turbomind::AbstractInstanceComm) [with T = __nv_bfloat16] [TM][DEBUG] std::unordered_map<std::basic_string, turbomind::Tensor> LlamaTritonModelInstance::convert_inputs(std::shared_ptr<std::unordered_map<std::basic_string, triton::Tensor> >) [with T = __nv_bfloat16] [TM][INFO] [forward][rank=0] INPUT: STOP [1] [TM][INFO] [forward][rank=0] INPUT: random_seed [1] [TM][INFO] [forward][rank=0] INPUT: input_ids [1, 26] [TM][INFO] [forward][rank=0] INPUT: input_lengths [1] [TM][INFO] [forward][rank=0] INPUT: CORRID [1] [TM][INFO] [forward][rank=0] INPUT: request_output_len [1] [TM][INFO] [forward][rank=0] INPUT: runtime_top_k [1] [TM][INFO] [forward][rank=0] INPUT: runtime_top_p [1] [TM][INFO] [forward][rank=0] INPUT: END [1] [TM][INFO] [forward][rank=0] INPUT: temperature [1] [TM][INFO] [forward][rank=0] INPUT: repetition_penalty [1] [TM][INFO] [forward][rank=0] INPUT: stop_words_list [1, 2, 3] [TM][INFO] [forward][rank=0] INPUT: step [1] [TM][INFO] [forward][rank=0] INPUT: START [1] [TM][INFO] [forward][rank=0] OUTPUT: sequence_length [1, 1] [TM][INFO] [forward][rank=0] OUTPUT: output_ids [1, 1, 131080] [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: CORRID [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = long unsigned int] start [TM][DEBUG] getVal with type x, but data type is: u8 [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = long unsigned int; size_t = long unsigned int] start [TM][DEBUG] getVal with type x, but data type is: u8 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: START [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: END [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: STOP [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_lengths [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][INFO] [ProcessInferRequests] Request for 0 received. [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: step [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_lengths [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_ids [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_embedding_ranges [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_embedding_ranges [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: request_output_len [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] getVal with type i4, but data type is: u4 [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] getVal with type i4, but data type is: u4 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: random_seed [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = long long unsigned int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = long long unsigned int; size_t = long unsigned int] start [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaBatch.cc:415 2024-09-02 10:29:44,285 - lmdeploy - INFO - prompt='<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nShanghai is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=2462795913259709772, stop_words=[128009, 128001, 128008], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 2059, 31170, 374, 128009, 128006, 78191, 128007, 271], adapter_name=None. 2024-09-02 10:29:44,285 - lmdeploy - INFO - session_id=1, history_tokens=0, input_tokens=24, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaBatch.h:166 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/copy.h:24 [TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=26, sum_k=26, max_q=26, max_k=26 2024-09-02 10:29:44,285 - lmdeploy - INFO - Register stream callback for 1 [TM][DEBUG] void turbomind::LlamaV2::forwardUnified(T, T, T*, void*, const int, const int, const int, const int, const float, const bool, size_t, int, int, int, const turbomind::Sequence) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] void turbomind::LlamaV2::updateEmbedding(T, int, const int, const turbomind::Sequence, int, int, bool) [with T = nv_bfloat16] [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaV2.cc:233 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaV2.cc:279 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_norm_weight [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: rope_theta [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_block_counts [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: block_ptrs [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = nv_bfloat16] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = nv_bfloat16] start [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_decoder.cc:163 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_query [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_q_len [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_k_len [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_cu_q_len [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_cu_k_len [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: hidden_features [TM][DEBUG] void turbomind::UnifiedAttentionLayer::forward(turbomind::TensorMap, const turbomind::TensorMap, const WeightType*) [with T = __nv_bfloat16; turbomind::UnifiedAttentionLayer::WeightType = turbomind::LlamaAttentionWeight<nv_bfloat16>] [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_query [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_q_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_k_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_cu_q_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_cu_k_len [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = bool] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: rope_theta [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = float] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: block_ptrs [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = void] start [TM][DEBUG] getPtr with type x, but data type is: u8 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_block_counts [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_query [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = nv_bfloat16] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: hidden_features [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = nv_bfloat16] start [TM][DEBUG] void turbomind::UnifiedAttentionLayer::allocateBuffer(size_t, size_t, size_t, const WeightType) [with T = nv_bfloat16; size_t = long unsigned int; turbomind::UnifiedAttentionLayer::WeightType = turbomind::LlamaAttentionWeight<nv_bfloat16>] [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] Set logger level by DEBUG [TM][DEBUG] malloc buffer 0x3883c3800 with size 319488 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] std::shared_ptr<std::unordered_map<std::basic_string, triton::Tensor> > LlamaTritonModelInstance::forward(std::shared_ptr<std::unordered_map<std::basic_string, triton::Tensor> >, turbomind::AbstractInstanceComm) [with T = nv_bfloat16] [TM][DEBUG] malloc buffer 0x388411800 with size 212992 [TM][DEBUG] void turbomind::IAllocator::reMalloc(T, size_t, bool, bool) [with T = nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Cannot find buffer (nil), mallocing new one. [TM][DEBUG] virtual void turbomind::Allocator::malloc(size_t, bool, bool) [TM][DEBUG] malloc buffer 0x388445800 with size 368640 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: lora_mask [TM][DEBUG] T turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] getPtr with type i4, but data type is: x [TM][DEBUG] std::unordered_map<std::basic_string, turbomind::Tensor> LlamaTritonModelInstance::convert_inputs(std::shared_ptr<std::unordered_map<std::basic_string, triton::Tensor> >) [with T = __nv_bfloat16] [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void, int, const void, int, void*, int, float, float) [TM][INFO] [forward][rank=0] INPUT: STOP [1] [TM][INFO] [forward][rank=0] INPUT: random_seed [1] [TM][INFO] [forward][rank=0] INPUT: input_ids [1, 24] [TM][INFO] [forward][rank=0] INPUT: input_lengths [1] [TM][INFO] [forward][rank=0] INPUT: CORRID [1] [TM][INFO] [forward][rank=0] INPUT: request_output_len [1] [TM][INFO] [forward][rank=0] INPUT: runtime_top_k [1] [TM][INFO] [forward][rank=0] INPUT: runtime_top_p [1] [TM][INFO] [forward][rank=0] INPUT: END [1] [TM][INFO] [forward][rank=0] INPUT: temperature [1] [TM][INFO] [forward][rank=0] INPUT: repetition_penalty [1] [TM][INFO] [forward][rank=0] INPUT: stop_words_list [1, 2, 3] [TM][INFO] [forward][rank=0] INPUT: step [1] [TM][INFO] [forward][rank=0] INPUT: START [1] [TM][INFO] [forward][rank=0] OUTPUT: sequence_length [1, 1] [TM][INFO] [forward][rank=0] OUTPUT: output_ids [1, 1, 131080] [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: CORRID [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = long unsigned int] start [TM][DEBUG] getVal with type x, but data type is: u8 [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = long unsigned int; size_t = long unsigned int] start [TM][DEBUG] getVal with type x, but data type is: u8 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: START [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: END [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: STOP [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_attention_layer.cc:289 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_attention_layer.cc:293 Aborted (core dumped)

InternLM / lmdeploy