NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.35k stars 939 forks source link

qwenvl run.py do not support muti-gpu #1444

Open hezeli123 opened 5 months ago

hezeli123 commented 5 months ago

System Info

2* A100 tensorrtllm 0.8.0

Who can help?

@byshiue @ncomly-nvidia @jun

Information

Tasks

Reproduction

  1. build qwenvl-9b model
  2. run.py runs ok on one A100
  3. run.py runs fail on 2*A100 mpirun -n 2 --allow-run-as-root python3 run.py \ --tokenizer_dir=./Qwen-VL-Chat \ --qwen_engine_dir=./trt_engines/Qwen-VL-9.8B-Chat-fp16 \ --vit_engine_dir=./plan \ --images_path='{"image": "./pics/demo.jpeg"}' \ --input_dir='{"image": "image.pt"}'

Expected behavior

run.py runs ok on 2*A100

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.8.0[04/12/2024-06:20:36] [TRT-LLM] [I] Loading engine from ./plan/visual_encoder/visual_encoder_fp16.plan [TensorRT-LLM] TensorRT-LLM version: 0.8.0[04/12/2024-06:20:36] [TRT-LLM] [I] Loading engine from ./plan/visual_encoder/visual_encoder_fp16.plan [04/12/2024-06:20:43] [TRT-LLM] [I] Creating session from engine ./plan/visual_encoder/visual_encoder_fp16.plan [04/12/2024-06:20:43] [TRT-LLM] [I] Creating session from engine ./plan/visual_encoder/visual_encoder_fp16.plan [04/12/2024-06:20:43] [TRT] [I] Loaded engine size: 3714 MiB [04/12/2024-06:20:43] [TRT] [I] Loaded engine size: 3714 MiB [04/12/2024-06:20:43] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3699, now: CPU 0, GPU 3699 (MiB) [04/12/2024-06:20:43] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3699, now: CPU 0, GPU 3699 (MiB) [04/12/2024-06:20:44] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +190, now: CPU 0, GPU 3889 (MiB) [04/12/2024-06:20:44] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +190, now: CPU 0, GPU 3889 (MiB) [04/12/2024-06:20:44] [TRT] [W] Using default stream in enqueue()/enqueueV2()/enqueueV3() may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead. [04/12/2024-06:20:44] [TRT] [W] Using default stream in enqueue()/enqueueV2()/enqueueV3() may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead. [04/12/2024-06:20:44] [TRT-LLM] [I] TensorRT-LLM ViT latency: 0.014638423919677734 sec [04/12/2024-06:20:44] [TRT-LLM] [I] TensorRT-LLM ViT latency: 0.019578218460083008 sec Loading engine from ./trt_engines/Qwen-VL-9.8B-Chat-fp16/qwen_float16_tp2_rank0.engine Loading engine from ./trt_engines/Qwen-VL-9.8B-Chat-fp16/qwen_float16_tp2_rank1.engine [04/12/2024-06:21:05] [TRT] [I] Loaded engine size: 7960 MiB [04/12/2024-06:21:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8214, GPU 8395 (MiB) [04/12/2024-06:21:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 8215, GPU 8405 (MiB) [04/12/2024-06:21:10] [TRT] [I] Loaded engine size: 7960 MiB [04/12/2024-06:21:12] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8198, GPU 8857 (MiB) [04/12/2024-06:21:12] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 8199, GPU 8867 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7957, now: CPU 0, GPU 7957 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7957, now: CPU 0, GPU 7957 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8465, GPU 10351 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8480, GPU 9893 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 8465, GPU 10359 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 8480, GPU 9901 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 7957 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8489, GPU 10379 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 8489, GPU 10389 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 7957 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8504, GPU 9921 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 8505, GPU 9931 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 7957 (MiB) [04/12/2024-06:21:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 7957 (MiB) tensorrt_llm/runtime/generation.py:933: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.) torch.nested.nested_tensor(split_ids_list, tensorrt_llm/runtime/generation.py:933: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.) torch.nested.nested_tensor(split_ids_list, [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:22] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) Traceback (most recent call last): File "tensorrt_llm/examples/qwenvl/run.py", line 516, in qinfer.qwen_infer(image_embeds, File "tensorrt_llm/examples/qwenvl/run.py", line 350, in qwen_infer [04/12/2024-06:21:23] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) ) Traceback (most recent call last): File "tensorrt_llm/examples/qwenvl/run.py", line 516, in output_ids, Qwen_time = self.generate_for_qwenvl( File "tensorrt_llm/examples/qwenvl/run.py", line 294, in generate_for_qwenvl output_ids = self.decoder.decode(input_ids, input_lengths, File "/tensorrt_llm/runtime/generation.py", line 757, in wrapper qinfer.qwen_infer(image_embeds, File "tensorrt_llm/examples/qwenvl/run.py", line 350, in qwen_infer output_ids, Qwen_time = self.generate_for_qwenvl( File "/tensorrt_llm/examples/qwenvl/run.py", line 294, in generate_for_qwenvl ret = func(self, *args, *kwargs) File "/tensorrt_llm/runtime/generation.py", line 2893, in decode output_ids = self.decoder.decode(input_ids, input_lengths, File "/tensorrt_llm/runtime/generation.py", line 757, in wrapper ret = func(self, args, **kwargs) File "/tensorrt_llm/runtime/generation.py", line 2893, in decode return self.decode_regular( File "/tensorrt_llm/runtime/generation.py", line 2550, in decode_regular should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step( File "/tensorrt_llm/runtime/generation.py", line 2233, in handle_per_step return self.decode_regular( File "/tensorrt_llm/runtime/generation.py", line 2550, in decode_regular raise RuntimeError(f"Executing TRT engine failed step={step}!") RuntimeError: Executing TRT engine failed step=0! should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step( File "/tensorrt_llm/runtime/generation.py", line 2233, in handle_per_step raise RuntimeError(f"Executing TRT engine failed step={step}!") RuntimeError: Executing TRT engine failed step=0!

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[19060,1],1] Exit code: 1

additional notes

no

juney-nvidia commented 5 months ago

@hezeli123

Can you share the concrete command used to build the engine?

Thanks June

hezeli123 commented 5 months ago

@hezeli123

Can you share the concrete command used to build the engine?

Thanks June

python3 ../qwen/build.py --hf_model_dir=Qwen-VL-Chat --world_size 2 --tp_size 2 \ --dtype float16 \ --max_batch_size 8 \ --max_input_len 2048 \ --max_output_len 1024 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \ --use_gemm_plugin float16 \ --enable_context_fmha \ --log_level verbose \ --use_lookup_plugin float16 \ --max_prompt_embedding_table_size 2048 \ --output_dir=./trt_engines/Qwen-VL-9.8B-Chat-fp16

bnuzhanyu commented 5 months ago

I got similar error for 1 gpu in 0.9.0 on A10.

To prevent GPU OOM, I modified the export_onnx function, only move model_visual to gpu.

model_visual = model.transformer.visual
model_visual.eval()
model_visual = model_visual.to(device)

Infer error:

/usr/local/lib/python3.10/dist-packages/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.)
  return _nested.nested_tensor(
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/16/2024-09:49:52] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
  File "/llm/tensorrt-llm/examples/qwenvl/run_chat.py", line 120, in <module>
    output_text = qinfer.qwen_infer(image_embeds, None, query,
  File "/llm/tensorrt-llm/examples/qwenvl/run.py", line 347, in qwen_infer
    output_ids, Qwen_time = self.generate_for_qwenvl(
  File "/llm/tensorrt-llm/examples/qwenvl/run.py", line 299, in generate_for_qwenvl
    output_ids = self.decoder.decode(input_ids, input_lengths,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 789, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2993, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2642, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2334, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

Build script

MODEL_ROOT_DIR='model_dir'
MAX_BATCH_SIZE=2
HF_MODEL_DIR=$MODEL_ROOT_DIR/Qwen-VL-Chat
ONNX_FILE=$MODEL_ROOT_DIR/visual_encoder/visual_encoder.onnx
PLAN_FILE=$MODEL_ROOT_DIR/plan/visual_encoder/visual_encoder_fp16.plan
CHECKPOINT_DIR=$MODEL_ROOT_DIR/qwen_vl_trt/checkpoint
ENGINE_DIR=$MODEL_ROOT_DIR/qwen_vl_trt/engine
MAX_INPUT_LEN=50
MAX_OUTPUT_LEN=200
MAX_PROMPT_EMBEDDING_TABLE_SIZE=$((MAX_BATCH_SIZE * MAX_INPUT_LEN))

export CUDA_VISIBLE_DEVICES=0,1
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

if [ ! -f $PLAN_FILE ]; then
    python3 vit_onnx_trt.py --pretrained_model_path $HF_MODEL_DIR \
                --onnxFile $ONNX_FILE --planFile $PLAN_FILE --maxBS $MAX_BATCH_SIZE
fi

python3 ../qwen/convert_checkpoint.py --model_dir=$HF_MODEL_DIR --output_dir=$CHECKPOINT_DIR

trtllm-build --checkpoint_dir=$CHECKPOINT_DIR \
             --gemm_plugin=float16 --gpt_attention_plugin=float16 \
             --lookup_plugin=float16 --max_input_len=$MAX_INPUT_LEN --max_output_len=$MAX_OUTPUT_LEN \
             --max_batch_size=$MAX_BATCH_SIZE --max_prompt_embedding_table_size=$MAX_PROMPT_EMBEDDING_TABLE_SIZE \
             --remove_input_padding=enable \
             --output_dir=$ENGINE_DIR

runScript

HF_MODEL_DIR=$MODEL_ROOT_DIR/Qwen-VL-Chat
ENGINE_DIR=$MODEL_ROOT_DIR/qwen_vl_trt/engine

python3 run.py \
    --tokenizer_dir=$HF_MODEL_DIR \
    --qwen_engine_dir=$ENGINE_DIR \
    --vit_engine_dir=$MODEL_ROOT_DIR/plan \
    --images_path='{"image": "./pics/demo.jpeg"}' \
    --input_dir='{"image": "image.pt"}'
bnuzhanyu commented 5 months ago

Use max_input_len=2048, max_output_len=1024, max_batch_size=8, max_prompt_embedding_table_size=2048 will work. I think tp_size=2 we should change max_prompt_embedding_table_size to half?

I was confused by this:

NOTE: `max_prompt_embedding_table_size = query_token_num * max_batch_size`, therefore, if you change `max_batch_size`, `--max_prompt_embedding_table_size` must be reset accordingly.

I know max_input_len, max_output_len, max_batch_size, how can I set max_prompt_embedding_table_size? What is the value of query_token_num? Is it fixed for qwen-vl (e.g. 256) ?

sunnyqgg commented 5 months ago

@bnuzhanyu As you found, MAX_INPUT_LEN=50 is too small so the error is reported. query_token_num is fixed(256) for Qwen-VL.