NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.57k stars 973 forks source link

got segment fault and signal code 1 when running llama with fp16 #997

Closed sharlynxy closed 16 minutes ago

sharlynxy commented 9 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

  1. I used pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com to install trt-llm

  2. build engine with model llama--Llama-2-7b-chat-hf. command is following: python build.py --model_dir $PATH_TO_LLAMA2_CHAT_HF --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \ --enable_context_fmha \ --use_gemm_plugin float16 \ --max_batch_size 1 \ --max_input_len 60 \ --max_output_len 60 \ --output_dir $PATH_TO_ENGINE

  3. Then I run the example script with python3 ../run.py --max_output_len=20 \ --tokenizer_dir $PATH_TO_LLAMA2_CHAT_HF \ --engine_dir=$PATH_TO_ENGINE, then I got

Expected behavior

expect to run TensortRT-LLM/examples/run.py successully.

actual behavior

  1. outputs of building engine:

    [01/29/2024-04:28:50] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_size*max_input_len. 
    It may not be optimal to set max_num_tokens=max_batch_size*max_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads.
    [01/29/2024-04:28:50] [TRT-LLM] [I] Serially build TensorRT engines.
    [01/29/2024-04:28:50] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 128, GPU 391 (MiB)
    [01/29/2024-04:28:52] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +316, now: CPU 2073, GPU 707 (MiB)
    [01/29/2024-04:28:52] [TRT-LLM] [W] Invalid timing cache, using freshly created one
    [01/29/2024-04:28:52] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 3.5387 (GiB) Device 0.6909 (GiB)
    [01/29/2024-04:28:52] [TRT-LLM] [I] Loading HF LLaMA ... from /data/1/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1b0db933684edbfe29a06fa47eb19cc48025e93
    Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 16.86it/s]
    [01/29/2024-04:28:52] [TRT-LLM] [I] Loading weights from HF LLaMA...
    [01/29/2024-04:28:52] [TRT-LLM] [I] Weights loaded. Total time: 00:00:00
    [01/29/2024-04:28:52] [TRT-LLM] [I] HF LLaMA loaded. Total time: 00:00:00
    [01/29/2024-04:28:52] [TRT-LLM] [I] [MemUsage] Rank 0 model weight loaded. - Allocated Memory: Host 9.8050 (GiB) Device 0.6909 (GiB)
    [01/29/2024-04:28:52] [TRT-LLM] [I] Context FMHA Enabled
    [01/29/2024-04:28:52] [TRT-LLM] [I] Optimized Generation MHA kernels (XQA) Enabled
    [01/29/2024-04:28:52] [TRT-LLM] [I] Remove Padding Enabled
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/vocab_embedding/GATHER_0_output_0 and LLaMAForCausalLM/layers/0/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/0/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/0/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/1/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/1/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/1/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/1/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/2/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/2/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/2/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/2/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/3/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/3/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/3/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/3/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/4/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/4/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/4/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/4/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/5/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/5/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/5/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/5/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/6/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/6/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/6/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/6/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/7/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/7/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/7/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/7/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/8/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/8/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/8/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/8/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/9/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/9/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/9/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/9/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/10/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/10/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/10/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/10/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/11/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/11/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/11/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/11/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/12/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/12/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/12/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/12/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/13/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/13/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/13/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/13/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/14/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/14/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/14/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/14/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/15/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/15/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/15/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/15/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/16/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/16/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/16/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/16/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/17/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/17/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/17/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/17/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/18/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/18/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/18/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/18/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/19/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/19/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/19/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/19/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/20/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/20/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/20/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/20/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/21/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/21/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/21/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/21/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/22/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/22/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/22/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:52] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/22/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/23/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/23/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/23/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/23/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/24/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/24/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/24/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/24/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/25/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/25/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/25/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/25/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/26/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/26/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/26/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/26/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/27/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/27/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/27/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/27/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/28/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/28/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/28/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/28/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/29/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/29/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/29/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/29/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/30/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/30/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/30/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/30/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/31/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/31/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/31/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/31/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/ln_f/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:53] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/ln_f/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/ln_f/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
    [01/29/2024-04:28:56] [TRT-LLM] [I] Build TensorRT engine llama_float16_tp1_rank0.engine
    [01/29/2024-04:28:56] [TRT] [W] Unused Input: position_ids
    [01/29/2024-04:28:56] [TRT] [W] Detected layernorm nodes in FP16.
    [01/29/2024-04:28:56] [TRT] [W] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
    [01/29/2024-04:28:56] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
    [01/29/2024-04:28:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5256, GPU 735 (MiB)
    [01/29/2024-04:28:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 5257, GPU 745 (MiB)
    [01/29/2024-04:28:56] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
    [01/29/2024-04:28:56] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
    [01/29/2024-04:29:04] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
    [01/29/2024-04:29:04] [TRT] [I] Detected 74 inputs and 33 output network tensors.
    [01/29/2024-04:29:08] [TRT] [I] Total Host Persistent Memory: 63184
    [01/29/2024-04:29:08] [TRT] [I] Total Device Persistent Memory: 0
    [01/29/2024-04:29:08] [TRT] [I] Total Scratch Memory: 33554944
    [01/29/2024-04:29:08] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 554 steps to complete.
    [01/29/2024-04:29:08] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 19.6214ms to assign 12 blocks to 554 nodes requiring 38166528 bytes.
    [01/29/2024-04:29:08] [TRT] [I] Total Activation Memory: 38166528
    [01/29/2024-04:29:08] [TRT] [I] Total Weights Memory: 13476831232
    [01/29/2024-04:29:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5433, GPU 13617 (MiB)
    [01/29/2024-04:29:08] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 5433, GPU 13627 (MiB)
    [01/29/2024-04:29:08] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
    [01/29/2024-04:29:08] [TRT] [I] Engine generation completed in 12.4135 seconds.
    [01/29/2024-04:29:08] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 500 MiB, GPU 12853 MiB
    [01/29/2024-04:29:08] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +12853, now: CPU 0, GPU 12853 (MiB)
    [01/29/2024-04:29:15] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 32795 MiB
    [01/29/2024-04:29:15] [TRT-LLM] [I] Total time of building llama_float16_tp1_rank0.engine: 00:00:18
    [01/29/2024-04:29:15] [TRT-LLM] [I] Config saved to /data/1/models/trt_engines/llama-2-7b-chat-hf/fp16/1-gpu/config.json.
    [01/29/2024-04:29:15] [TRT] [I] Loaded engine size: 12855 MiB
    [01/29/2024-04:29:17] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 15143, GPU 13601 (MiB)
    [01/29/2024-04:29:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 15143, GPU 13609 (MiB)
    [01/29/2024-04:29:17] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
    [01/29/2024-04:29:17] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12852, now: CPU 0, GPU 12852 (MiB)
    [01/29/2024-04:29:17] [TRT-LLM] [I] Activation memory size: 36.40 MiB
    [01/29/2024-04:29:17] [TRT-LLM] [I] Weights memory size: 12855.63 MiB
    [01/29/2024-04:29:17] [TRT-LLM] [I] Max KV Cache memory size: 60.00 MiB
    [01/29/2024-04:29:17] [TRT-LLM] [I] Estimated max memory usage on runtime: 12952.03 MiB
    [01/29/2024-04:29:17] [TRT-LLM] [I] Serializing engine to /data/1/models/trt_engines/llama-2-7b-chat-hf/fp16/1-gpu/llama_float16_tp1_rank0.engine...
    [01/29/2024-04:29:46] [TRT-LLM] [I] Engine serialized. Total time: 00:00:29
    [01/29/2024-04:29:47] [TRT-LLM] [I] [MemUsage] Rank 0 Engine serialized - Allocated Memory: Host 3.9102 (GiB) Device 0.7124 (GiB)
    [01/29/2024-04:29:47] [TRT-LLM] [I] Rank 0 Engine build time: 00:00:54 - 54.94335222244263 (sec)
    [01/29/2024-04:29:47] [TRT] [I] Serialized 59 bytes of code generator cache.
    [01/29/2024-04:29:47] [TRT] [I] Serialized 159814 bytes of compilation cache.
    [01/29/2024-04:29:47] [TRT] [I] Serialized 9 timing cache entries
    [01/29/2024-04:29:47] [TRT-LLM] [I] Timing cache serialized to /data/1/models/trt_engines/llama-2-7b-chat-hf/fp16/1-gpu/model.cache
    [01/29/2024-04:29:47] [TRT-LLM] [I] Total time of building all 1 engines: 00:00:57
  2. outputs of running the engine

    [ic5:133017] *** Process received signal ***
    [ic5:133017] Signal: Segmentation fault (11)
    [ic5:133017] Signal code: Address not mapped (1)
    [ic5:133017] Failing at address: 0x440000e9
    [ic5:133017] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f978f9e4520]
    [ic5:133017] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7f961ed836b7]
    [ic5:133017] [ 2] /data/1/conda/envs/trt/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9abf0)[0x7f95b338ebf0]
    [ic5:133017] [ 3] /data/1/conda/envs/trt/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2decf)[0x7f95b3321ecf]
    [ic5:133017] [ 4] python3(PyModule_ExecDef+0x70)[0x597be0]
    [ic5:133017] [ 5] python3[0x598f69]
    [ic5:133017] [ 6] python3[0x4fcf3b]
    [ic5:133017] [ 7] python3(_PyEval_EvalFrameDefault+0x5a35)[0x4f3375]
    [ic5:133017] [ 8] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [ 9] python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2466]
    [ic5:133017] [10] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [11] python3(_PyEval_EvalFrameDefault+0x731)[0x4ee071]
    [ic5:133017] [12] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [13] python3(_PyEval_EvalFrameDefault+0x31f)[0x4edc5f]
    [ic5:133017] [14] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [15] python3(_PyEval_EvalFrameDefault+0x31f)[0x4edc5f]
    [ic5:133017] [16] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [17] python3[0x4fd0d4]
    [ic5:133017] [18] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50be37]
    [ic5:133017] [19] python3(PyImport_ImportModuleLevelObject+0x525)[0x50b195]
    [ic5:133017] [20] python3[0x516f44]
    [ic5:133017] [21] python3[0x4fd4c7]
    [ic5:133017] [22] python3(PyObject_Call+0x209)[0x509d69]
    [ic5:133017] [23] python3(_PyEval_EvalFrameDefault+0x5a35)[0x4f3375]
    [ic5:133017] [24] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [25] python3(_PyEval_EvalFrameDefault+0x31f)[0x4edc5f]
    [ic5:133017] [26] python3(_PyFunction_Vectorcall+0x6f)[0x4fd90f]
    [ic5:133017] [27] python3[0x4fd0d4]
    [ic5:133017] [28] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50be37]
    [ic5:133017] [29] python3(PyImport_ImportModuleLevelObject+0x9da)[0x50b64a]
    [ic5:133017] *** End of error message ***
    Segmentation fault (core dumped)

additional notes

When I run the code as below I got the same problem.

from tensorrt_llm import LLM, ModelConfig

model_path = "trt_engines/llama-2-7b-chat-hf/fp16/1-gpu/"
config = ModelConfig(model_path)
llm = LLM(config)
prompts = ["hi"]
for o in llm.generate(prompts):
    print(o)
sugar5727 commented 7 months ago

I had the same problem


Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
[ubuntu-Precision-7960-Tower:16603] *** Process received signal ***
[ubuntu-Precision-7960-Tower:16603] Signal: Segmentation fault (11)
[ubuntu-Precision-7960-Tower:16603] Signal code: Address not mapped (1)
[ubuntu-Precision-7960-Tower:16603] Failing at address: 0x440000e9
[ubuntu-Precision-7960-Tower:16603] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7b630b242520]
[ubuntu-Precision-7960-Tower:16603] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7b61e87366b7]
[ubuntu-Precision-7960-Tower:16603] [ 2] /home/ubuntu/anaconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9b550)[0x7b60d2f3c550]
[ubuntu-Precision-7960-Tower:16603] [ 3] /home/ubuntu/anaconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2e41f)[0x7b60d2ecf41f]
[ubuntu-Precision-7960-Tower:16603] [ 4] python(PyModule_ExecDef+0x70)[0x597d40]
[ubuntu-Precision-7960-Tower:16603] [ 5] python[0x5990c9]
[ubuntu-Precision-7960-Tower:16603] [ 6] python[0x4fd37b]
[ubuntu-Precision-7960-Tower:16603] [ 7] python(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[ubuntu-Precision-7960-Tower:16603] [ 8] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [ 9] python(_PyEval_EvalFrameDefault+0x4b26)[0x4f2856]
[ubuntu-Precision-7960-Tower:16603] [10] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [11] python(_PyEval_EvalFrameDefault+0x731)[0x4ee461]
[ubuntu-Precision-7960-Tower:16603] [12] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [13] python(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[ubuntu-Precision-7960-Tower:16603] [14] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [15] python(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[ubuntu-Precision-7960-Tower:16603] [16] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [17] python[0x4fd514]
[ubuntu-Precision-7960-Tower:16603] [18] python(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[ubuntu-Precision-7960-Tower:16603] [19] python(PyImport_ImportModuleLevelObject+0x525)[0x50b685]
[ubuntu-Precision-7960-Tower:16603] [20] python[0x517454]
[ubuntu-Precision-7960-Tower:16603] [21] python[0x4fd907]
[ubuntu-Precision-7960-Tower:16603] [22] python(PyObject_Call+0x209)[0x50a259]
[ubuntu-Precision-7960-Tower:16603] [23] python(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[ubuntu-Precision-7960-Tower:16603] [24] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [25] python(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[ubuntu-Precision-7960-Tower:16603] [26] python(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[ubuntu-Precision-7960-Tower:16603] [27] python[0x4fd514]
[ubuntu-Precision-7960-Tower:16603] [28] python(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[ubuntu-Precision-7960-Tower:16603] [29] python(PyImport_ImportModuleLevelObject+0x9da)[0x50bb3a]
[ubuntu-Precision-7960-Tower:16603] *** End of error message ***
[1]    16603 segmentation fault (core dumped)  python app.py
sugar5727 commented 7 months ago

And I add

import faulthandler
faulthandler.enable()

get the error


Current thread 0x000076ef939e8740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1184 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1078 in _handle_fromlist
  File "/home/ubuntu/anaconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/_utils.py", line 216 in mpi_comm
  File "/home/ubuntu/anaconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/_utils.py", line 221 in mpi_rank
  File "/home/ubuntu/TensorRt_LLM/trt-llm-rag-linux-master/trt_llama_api.py", line 106 in __init__
  File "/home/ubuntu/TensorRt_LLM/trt-llm-rag-linux-master/app.py", line 104 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, gmpy2.gmpy2, cython.cimports.libc.math, scipy._lib._ccallback_c, numpy.linalg.lapack_lite, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, regex._regex, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, _brotli, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, yaml._yaml, sentencepiece._sentencepiece, cuda._lib.utils, cuda._cuda.ccuda, cuda.ccuda, cuda.cuda, cuda._lib.ccudart.utils, cuda._lib.ccudart.ccudart, cuda.ccudart, cuda.cudart, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, faiss._swigfaiss_avx2, websockets.speedups, PIL._imaging, ujson, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, markupsafe._speedups, matplotlib._image, PIL._imagingmath, PIL._webp, mpi4py.MPI (total: 243)
[1]    16874 segmentation fault (core dumped)  python app.py