TensorRT Segmentation fault

matuszelenak commented 2 months ago

I'm trying to run the TensorRT version of the docker container according to instructions, but am getting a segfault whenever I attempt to transcribe any audio. The same audio works with the Faster whisper backend. This happens for both live transcription and submission of file

System info: Debian 12 VM with a RTX 3090 passthrough to it. Driver version 545.23.06

Full log:

(base) whiskas@debian-gpu:/mnt/samsung/projects/WhisperLive$ docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it ghcr.io/collabora/whisperlive-tensorrt
root@c3f2d94d2f68:/app# bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Requirement already satisfied: tensorrt_llm==0.9.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (0.9.0)
Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (0.7.0)
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (3.0.0)
Requirement already satisfied: kaldialign in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 5)) (0.9.1)
Requirement already satisfied: openai-whisper in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (20231117)
Collecting librosa
  Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.1/260.1 KB 1.9 MB/s eta 0:00:00
Requirement already satisfied: soundfile in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (0.12.1)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (0.4.5)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (4.38.2)
Requirement already satisfied: janus in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (1.0.0)
Installing collected packages: librosa
Successfully installed librosa-0.10.2.post1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Downloading small.en...
--2024-09-17 07:04:29--  https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
Resolving openaipublic.azureedge.net (openaipublic.azureedge.net)... 13.107.246.67, 2620:1ec:bdf::67
Connecting to openaipublic.azureedge.net (openaipublic.azureedge.net)|13.107.246.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 483615683 (461M) [application/octet-stream]
Saving to: 'assets/small.en.pt'

small.en.pt                                     100%[=====================================================================================================>] 461.21M  3.22MB/s    in 2m 53s  

2024-09-17 07:07:24 (2.66 MB/s) - 'assets/small.en.pt' saved [483615683/483615683]

Download completed: small.en.pt
whisper_small_en
Running build script for small.en with output directory whisper_small_en
[TensorRT-LLM] TensorRT-LLM version: 0.9.0
[09/17/2024-07:07:26] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/17/2024-07:07:26] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/17/2024-07:07:26] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/17/2024-07:07:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 598, GPU 17025 (MiB)
[09/17/2024-07:07:28] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1799, GPU +312, now: CPU 2533, GPU 17337 (MiB)
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [I] Loading encoder weights from PT...
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.HALF but set to DataType.FLOAT
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [I] Set bert_attention_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set gpt_attention_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set gemm_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set smooth_quant_gemm_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set identity_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set layernorm_quantization_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set rmsnorm_quantization_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set nccl_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set lookup_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set lora_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set weight_only_groupwise_quant_matmul_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set weight_only_quant_matmul_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set quantize_per_token_plugin to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set quantize_tensor_plugin to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set moe_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set mamba_conv1d_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set paged_kv_cache to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set remove_input_padding to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_custom_all_reduce to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set multi_block_mode to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set enable_xqa to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set multiple_profiles to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set paged_state to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set streamingllm to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set gemm_plugin to float16.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha to True.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/SHUFFLE_0_output_0 and WhisperEncoder/conv1/SHUFFLE_1_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/conv1/SHUFFLE_1_output_0 and WhisperEncoder/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/SHUFFLE_2_output_0 and WhisperEncoder/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/conv1/SHUFFLE_1_output_0 and WhisperEncoder/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/SHUFFLE_3_output_0 and WhisperEncoder/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:07:28] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[09/17/2024-07:07:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2736, GPU 17363 (MiB)
[09/17/2024-07:07:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 2738, GPU 17373 (MiB)
[09/17/2024-07:07:28] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:07:28] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[09/17/2024-07:08:06] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[09/17/2024-07:08:06] [TRT] [I] Total Host Persistent Memory: 32384
[09/17/2024-07:08:06] [TRT] [I] Total Device Persistent Memory: 0
[09/17/2024-07:08:06] [TRT] [I] Total Scratch Memory: 33602560
[09/17/2024-07:08:06] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 258 steps to complete.
[09/17/2024-07:08:06] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 5.03566ms to assign 6 blocks to 258 nodes requiring 217874944 bytes.
[09/17/2024-07:08:06] [TRT] [I] Total Activation Memory: 217874432
[09/17/2024-07:08:06] [TRT] [I] Total Weights Memory: 176329728
[09/17/2024-07:08:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2966, GPU 17569 (MiB)
[09/17/2024-07:08:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2966, GPU 17579 (MiB)
[09/17/2024-07:08:06] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:08:06] [TRT] [I] Engine generation completed in 38.3389 seconds.
[09/17/2024-07:08:06] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 1126 MiB
[09/17/2024-07:08:06] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +169, now: CPU 0, GPU 169 (MiB)
[09/17/2024-07:08:06] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 4912 MiB
[09/17/2024-07:08:06] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:38
[09/17/2024-07:08:06] [TRT-LLM] [I] Config saved to whisper_small_en/encoder_config.json.
[09/17/2024-07:08:06] [TRT-LLM] [I] Serializing engine to whisper_small_en/whisper_encoder_float16_tp1_rank0.engine...
[09/17/2024-07:08:07] [TRT-LLM] [I] Engine serialized. Total time: 00:00:00
[09/17/2024-07:08:07] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2961, GPU 17373 (MiB)
[09/17/2024-07:08:07] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[09/17/2024-07:08:07] [TRT-LLM] [I] Loading decoder weights from PT...
[09/17/2024-07:08:07] [TRT-LLM] [I] Set bert_attention_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gpt_attention_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gemm_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set smooth_quant_gemm_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set identity_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set layernorm_quantization_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set rmsnorm_quantization_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set nccl_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set lookup_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set lora_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set weight_only_groupwise_quant_matmul_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set weight_only_quant_matmul_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set quantize_per_token_plugin to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set quantize_tensor_plugin to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set moe_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set mamba_conv1d_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set paged_kv_cache to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set remove_input_padding to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_custom_all_reduce to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set multi_block_mode to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set enable_xqa to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set multiple_profiles to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set paged_state to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set streamingllm to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gemm_plugin to float16.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha to True.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/embedding/vocab_embedding/GATHER_0_output_0 and DecoderModel/embedding/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/embedding/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/0/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/0/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/0/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/0/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/0/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/1/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/1/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/1/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/1/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/1/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/1/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/1/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/1/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/1/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/2/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/2/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/2/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/2/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/2/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/2/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/2/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/2/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/2/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/3/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/3/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/3/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/3/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/3/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/3/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/3/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/3/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/3/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/4/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/4/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/4/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/4/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/4/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/4/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/4/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/4/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/4/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/5/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/5/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/5/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/5/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/5/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/5/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/5/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/5/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/5/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/6/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/6/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/6/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/6/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/6/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/6/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/6/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/6/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/6/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/7/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/7/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/7/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/7/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/7/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/7/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/7/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/7/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/7/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/8/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/8/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/8/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/8/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/8/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/8/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/8/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/8/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/8/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/9/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/9/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/9/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/9/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/9/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/9/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/9/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/9/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/9/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/10/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/10/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/10/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/10/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/10/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/10/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/10/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/10/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/10/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/11/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/11/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/11/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/11/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/11/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/11/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/11/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/11/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/11/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[09/17/2024-07:08:07] [TRT] [W] Unused Input: cross_kv_cache_gen
[09/17/2024-07:08:07] [TRT] [W] [RemoveDeadLayers] Input Tensor cross_kv_cache_gen is unused or used only at compile-time, but is not being removed.
[09/17/2024-07:08:07] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3015, GPU 17381 (MiB)
[09/17/2024-07:08:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3015, GPU 17389 (MiB)
[09/17/2024-07:08:07] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:08:07] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[09/17/2024-07:08:17] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[09/17/2024-07:08:17] [TRT] [I] Detected 38 inputs and 25 output network tensors.
[09/17/2024-07:08:17] [TRT] [I] Total Host Persistent Memory: 51472
[09/17/2024-07:08:17] [TRT] [I] Total Device Persistent Memory: 0
[09/17/2024-07:08:17] [TRT] [I] Total Scratch Memory: 232116992
[09/17/2024-07:08:17] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 350 steps to complete.
[09/17/2024-07:08:17] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 14.0794ms to assign 16 blocks to 350 nodes requiring 459499520 bytes.
[09/17/2024-07:08:17] [TRT] [I] Total Activation Memory: 459497984
[09/17/2024-07:08:17] [TRT] [I] Total Weights Memory: 386860032
[09/17/2024-07:08:17] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3067, GPU 17771 (MiB)
[09/17/2024-07:08:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3067, GPU 17779 (MiB)
[09/17/2024-07:08:17] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:08:17] [TRT] [I] Engine generation completed in 10.4557 seconds.
[09/17/2024-07:08:17] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 153 MiB, GPU 1126 MiB
[09/17/2024-07:08:17] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +369, now: CPU 0, GPU 369 (MiB)
[09/17/2024-07:08:17] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 5244 MiB
[09/17/2024-07:08:18] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:10
[09/17/2024-07:08:18] [TRT-LLM] [I] Config saved to whisper_small_en/decoder_config.json.
[09/17/2024-07:08:18] [TRT-LLM] [I] Serializing engine to whisper_small_en/whisper_decoder_float16_tp1_rank0.engine...
[09/17/2024-07:08:18] [TRT-LLM] [I] Engine serialized. Total time: 00:00:00
Whisper small.en TensorRT engine built.
=========================================
Model is located at: /app/TensorRT-LLM-examples/whisper/whisper_small_en
root@c3f2d94d2f68:/app# python3 run_server.py --port 9090 \
                      --backend tensorrt \
                      --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en"
[TensorRT-LLM] TensorRT-LLM version: 0.9.0
--2024-09-17 07:11:35--  https://github.com/snakers4/silero-vad/raw/v4.0/files/silero_vad.onnx
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx [following]
--2024-09-17 07:11:36--  https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1807522 (1.7M) [application/octet-stream]
Saving to: ‘/root/.cache/whisper-live/silero_vad.onnx’

/root/.cache/whisper-live/silero_vad.onnx       100%[=====================================================================================================>]   1.72M  --.-KB/s    in 0.09s   

2024-09-17 07:11:36 (19.7 MB/s) - ‘/root/.cache/whisper-live/silero_vad.onnx’ saved [1807522/1807522]

[c3f2d94d2f68:00045] *** Process received signal ***
[c3f2d94d2f68:00045] Signal: Segmentation fault (11)
[c3f2d94d2f68:00045] Signal code: Address not mapped (1)
[c3f2d94d2f68:00045] Failing at address: 0x18
[c3f2d94d2f68:00045] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7efd9a0fe520]
[c3f2d94d2f68:00045] [ 1] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7efc04ba9dc8]
[c3f2d94d2f68:00045] [ 2] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6common10IAllocator8reMallocIiEEPT_S4_mb+0xb4)[0x7efc0e65f144]
[c3f2d94d2f68:00045] [ 3] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE14allocateBufferEv+0x3f)[0x7efc0e662b7f]
[c3f2d94d2f68:00045] [ 4] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE10initializeEv+0x1c6)[0x7efc0e664ca6]
[c3f2d94d2f68:00045] [ 5] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfEC2ERKNS_7runtime12DecodingModeEiiiiP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEEP14cudaDevicePropSt8optionalIiESH_+0x230)[0x7efc0e665150]
[c3f2d94d2f68:00045] [ 6] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeI6__halfEC1Emmmmii+0x2ce)[0x7efc04b86fee]
[c3f2d94d2f68:00045] [ 7] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x8a)[0x7efc04b6bdba]
[c3f2d94d2f68:00045] [ 8] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7efc04b6bf04]
[c3f2d94d2f68:00045] [ 9] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_+0xf8)[0x7efc04b873f8]
[c3f2d94d2f68:00045] [10] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7efd982de34e]
[c3f2d94d2f68:00045] [11] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7efd982db8df]
[c3f2d94d2f68:00045] [12] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7efd982dd929]
[c3f2d94d2f68:00045] [13] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7efd97d4ce04]
[c3f2d94d2f68:00045] [14] python3(+0x15adae)[0x55a567c1fdae]
[c3f2d94d2f68:00045] [15] python3(_PyObject_MakeTpCall+0x25b)[0x55a567c1652b]
[c3f2d94d2f68:00045] [16] python3(+0x169680)[0x55a567c2e680]
[c3f2d94d2f68:00045] [17] python3(+0x28139b)[0x55a567d4639b]
[c3f2d94d2f68:00045] [18] python3(_PyObject_MakeTpCall+0x25b)[0x55a567c1652b]
[c3f2d94d2f68:00045] [19] python3(_PyEval_EvalFrameDefault+0x6f0b)[0x55a567c0f16b]
[c3f2d94d2f68:00045] [20] python3(_PyFunction_Vectorcall+0x7c)[0x55a567c206ac]
[c3f2d94d2f68:00045] [21] python3(_PyObject_FastCallDictTstate+0x16d)[0x55a567c1576d]
[c3f2d94d2f68:00045] [22] python3(+0x1657a4)[0x55a567c2a7a4]
[c3f2d94d2f68:00045] [23] python3(_PyObject_MakeTpCall+0x1fc)[0x55a567c164cc]
[c3f2d94d2f68:00045] [24] python3(_PyEval_EvalFrameDefault+0x7611)[0x55a567c0f871]
[c3f2d94d2f68:00045] [25] python3(_PyFunction_Vectorcall+0x7c)[0x55a567c206ac]
[c3f2d94d2f68:00045] [26] python3(_PyEval_EvalFrameDefault+0x8cb)[0x55a567c08b2b]
[c3f2d94d2f68:00045] [27] python3(_PyFunction_Vectorcall+0x7c)[0x55a567c206ac]
[c3f2d94d2f68:00045] [28] python3(_PyObject_FastCallDictTstate+0x16d)[0x55a567c1576d]
[c3f2d94d2f68:00045] [29] python3(+0x1657a4)[0x55a567c2a7a4]
[c3f2d94d2f68:00045] *** End of error message ***
Segmentation fault (core dumped)

skinnynpale commented 2 months ago

same..

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:82:00.0 Off |                  Off |
| 30%   25C    P8              8W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

makaveli10 commented 2 months ago

https://github.com/collabora/WhisperLive/pull/276 should resolve this.

skinnynpale commented 2 months ago

276 should resolve this.

can you please update ghcr.io/collabora/whisperlive-tensorrt:latest? because there are still the same old problems there

matuszelenak commented 2 months ago

276 should resolve this.

Unfortunately, does not seem like it.

(base) whiskas@debian-gpu:~$ docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisper-live-trt:latest
root@7f39b90ea7e6:/app# nvidia-smi
Thu Sep 19 11:45:35 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   36C    P8             12W /  420W |       4MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@7f39b90ea7e6:/app# bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Requirement already satisfied: tensorrt_llm==0.10.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (0.10.0)
Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (0.3.3)
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (3.0.0)
Requirement already satisfied: kaldialign in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 5)) (0.9.1)
Requirement already satisfied: openai-whisper in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (20231117)
Collecting librosa
  Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.1/260.1 KB 1.9 MB/s eta 0:00:00
Requirement already satisfied: soundfile in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (0.12.1)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (0.4.5)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (4.40.2)
Requirement already satisfied: janus in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (1.0.0)
Installing collected packages: librosa
Successfully installed librosa-0.10.2.post1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Downloading small.en...
--2024-09-19 11:46:00--  https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
Resolving openaipublic.azureedge.net (openaipublic.azureedge.net)... 13.107.253.67, 2620:1ec:29:1::67
Connecting to openaipublic.azureedge.net (openaipublic.azureedge.net)|13.107.253.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 483615683 (461M) [application/octet-stream]
Saving to: 'assets/small.en.pt'

small.en.pt                                      100%[=======================================================================================================>] 461.21M  16.6MB/s    in 20s     

2024-09-19 11:46:20 (23.0 MB/s) - 'assets/small.en.pt' saved [483615683/483615683]

Download completed: small.en.pt
whisper_small_en
Running build script for small.en with output directory whisper_small_en
[TensorRT-LLM] TensorRT-LLM version: 0.10.0
[09/19/2024-11:46:22] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/19/2024-11:46:22] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/19/2024-11:46:22] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/19/2024-11:46:23] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 594, GPU 263 (MiB)
[09/19/2024-11:46:24] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2132, GPU +396, now: CPU 2882, GPU 659 (MiB)
[09/19/2024-11:46:24] [TRT] [W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.

...

[09/19/2024-11:46:53] [TRT] [I] Total Weights Memory: 386860032 bytes
[09/19/2024-11:46:53] [TRT] [I] Compiler backend is used during engine execution.
[09/19/2024-11:46:53] [TRT] [I] Engine generation completed in 7.84096 seconds.
[09/19/2024-11:46:53] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 153 MiB, GPU 1126 MiB
[09/19/2024-11:46:53] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 5786 MiB
[09/19/2024-11:46:53] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:07
[09/19/2024-11:46:53] [TRT-LLM] [I] Config saved to whisper_small_en/decoder_config.json.
[09/19/2024-11:46:53] [TRT-LLM] [I] Serializing engine to whisper_small_en/whisper_decoder_float16_tp1_rank0.engine...
[09/19/2024-11:46:53] [TRT-LLM] [I] Engine serialized. Total time: 00:00:00
Whisper small.en TensorRT engine built.
=========================================
Model is located at: /app/TensorRT-LLM-examples/whisper/whisper_small_en
root@7f39b90ea7e6:/app# python3 run_server.py --port 9090 \
                      --backend tensorrt \
                      --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en"
[TensorRT-LLM] TensorRT-LLM version: 0.10.0
--2024-09-19 11:47:44--  https://github.com/snakers4/silero-vad/raw/v4.0/files/silero_vad.onnx
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx [following]
--2024-09-19 11:47:44--  https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1807522 (1.7M) [application/octet-stream]
Saving to: ‘/root/.cache/whisper-live/silero_vad.onnx’

/root/.cache/whisper-live/silero_vad.onnx        100%[=======================================================================================================>]   1.72M  --.-KB/s    in 0.09s   

2024-09-19 11:47:45 (19.3 MB/s) - ‘/root/.cache/whisper-live/silero_vad.onnx’ saved [1807522/1807522]

[7f39b90ea7e6:00362] *** Process received signal ***
[7f39b90ea7e6:00362] Signal: Segmentation fault (11)
[7f39b90ea7e6:00362] Signal code: Address not mapped (1)
[7f39b90ea7e6:00362] Failing at address: 0x18
[7f39b90ea7e6:00362] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f27331b6520]
[7f39b90ea7e6:00362] [ 1] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7f251c570d58]
[7f39b90ea7e6:00362] [ 2] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE14allocateBufferEv+0xd4)[0x7f253570f434]
[7f39b90ea7e6:00362] [ 3] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE10initializeEv+0x128)[0x7f2535713108]
[7f39b90ea7e6:00362] [ 4] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfEC2ERKNS_7runtime12DecodingModeERKNS0_13DecoderDomainEP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEE+0xb1)[0x7f2535713311]
[7f39b90ea7e6:00362] [ 5] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeI6__halfEC1Emmmmii+0x270)[0x7f251c550c70]
[7f39b90ea7e6:00362] [ 6] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x8a)[0x7f251c5340ca]
[7f39b90ea7e6:00362] [ 7] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7f251c534214]
[7f39b90ea7e6:00362] [ 8] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_+0xf8)[0x7f251c551058]
[7f39b90ea7e6:00362] [ 9] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7f27312de34e]
[7f39b90ea7e6:00362] [10] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7f27312db8df]
[7f39b90ea7e6:00362] [11] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7f27312dd929]
[7f39b90ea7e6:00362] [12] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7f2730d4ce04]
[7f39b90ea7e6:00362] [13] python3(+0x15cb2e)[0x55af3c364b2e]
[7f39b90ea7e6:00362] [14] python3(_PyObject_MakeTpCall+0x25b)[0x55af3c35b2db]
[7f39b90ea7e6:00362] [15] python3(+0x16b6b0)[0x55af3c3736b0]
[7f39b90ea7e6:00362] [16] python3(+0x2826fb)[0x55af3c48a6fb]
[7f39b90ea7e6:00362] [17] python3(_PyObject_MakeTpCall+0x25b)[0x55af3c35b2db]
[7f39b90ea7e6:00362] [18] python3(_PyEval_EvalFrameDefault+0x6b17)[0x55af3c353d27]
[7f39b90ea7e6:00362] [19] python3(_PyFunction_Vectorcall+0x7c)[0x55af3c36542c]
[7f39b90ea7e6:00362] [20] python3(_PyObject_FastCallDictTstate+0x16d)[0x55af3c35a51d]
[7f39b90ea7e6:00362] [21] python3(+0x1674b4)[0x55af3c36f4b4]
[7f39b90ea7e6:00362] [22] python3(_PyObject_MakeTpCall+0x1fc)[0x55af3c35b27c]
[7f39b90ea7e6:00362] [23] python3(_PyEval_EvalFrameDefault+0x72ea)[0x55af3c3544fa]
[7f39b90ea7e6:00362] [24] python3(_PyFunction_Vectorcall+0x7c)[0x55af3c36542c]
[7f39b90ea7e6:00362] [25] python3(_PyEval_EvalFrameDefault+0x8ab)[0x55af3c34dabb]
[7f39b90ea7e6:00362] [26] python3(_PyFunction_Vectorcall+0x7c)[0x55af3c36542c]
[7f39b90ea7e6:00362] [27] python3(_PyObject_FastCallDictTstate+0x16d)[0x55af3c35a51d]
[7f39b90ea7e6:00362] [28] python3(+0x1674b4)[0x55af3c36f4b4]
[7f39b90ea7e6:00362] [29] python3(_PyObject_MakeTpCall+0x1fc)[0x55af3c35b27c]
[7f39b90ea7e6:00362] *** End of error message ***
Segmentation fault (core dumped)

makaveli10 commented 2 months ago

Docker image updated on ghcr. Le us know if the issue still persists.

skinnynpale commented 2 months ago

Docker image updated on ghcr. Le us know if the issue still persists.

it works! thank you :)

collabora / WhisperLive

TensorRT Segmentation fault #277

276 should resolve this.

276 should resolve this.