Open viningz opened 1 year ago
Hi @viningz ,
Can you share the command-line you used to build the engine, please?
Thanks, Julien
Hi @viningz ,
Can you share the command-line you used to build the engine, please?
Thanks, Julien
Thank you very much for your reply! The input command line I have is:
python build.py --model_version v1_7b \ --model_dir baichuan-inc/Baichuan-13B-Chat \ --dtype float16 \ --use_gemm_plugin float16 \ --use_gpt_attention_plugin float16 \ --output_dir ./tmp/baichuan_v1_13b/trt_engines/fp16/1-gpu/ --use_inflight_batching
Hi @viningz ,
Can you share the command-line you used to build the engine, please?
Thanks, Julien
When I don't use the --use_inflight_batching option, the converted model works fine.
same issue for llama_7b
Could you follow this document and try again on latest main branch?
When I was converting the Bai Chuan model and wanted to enable inflight batching, an error occurred. The error message is as follows:
[10/30/2023-09:28:34] [TRT] [W] Unused Input: position_ids [10/30/2023-09:28:34] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed. [10/30/2023-09:28:34] [TRT] [I] Graph optimization time: 0.0630199 seconds. [10/30/2023-09:28:34] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 15556, GPU 1786 (MiB) [10/30/2023-09:28:34] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 15558, GPU 1796 (MiB) [10/30/2023-09:28:34] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored. [10/30/2023-09:28:35] [TRT] [E] 9: Skipping tactic0x0000000000000000 due to exception PLUGIN_V2 operation not supported within this graph. [10/30/2023-09:28:36] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[BaichuanForCausalLM/layers/0/attention/PLUGIN_V2_GPTAttention_0]}. [10/30/2023-09:28:36] [TRT] [E] 10: [optimizer.cpp::computeCosts::4040] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[BaichuanForCausalLM/layers/0/attention/PLUGIN_V2_GPTAttention_0]}.) [10/30/2023-09:28:36] [TRT-LLM] [E] Engine building failed, please check the error log. [10/30/2023-09:28:36] [TRT-LLM] [I] Config saved to /home/kas/models/word_layout/baichuang/lora_v10_merge_2/trt_engines/fp16/1-gpu-page-kv-cache/config.json. Traceback (most recent call last): File "/home/kas/kas_workspace/zhengweining/TensorRT-LLM/examples/baichuan/build.py", line 477, in
build(0, args)
File "/home/kas/kas_workspace/zhengweining/TensorRT-LLM/examples/baichuan/build.py", line 449, in build
assert engine is not None, f'Failed to build engine for rank {cur_rank}'
AssertionError: Failed to build engine for rank 0
Could you please let me know how to resolve this issue?