Migrating engine build process from 0.7.1 to 0.10.0

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache License 2.0

7.34k stars 794 forks source link

Hi team,

We have been using 0.7.1 and now upgrading to 0.10.0. While running the step, I get the below error:

convert_checkpoint.py: error: unrecognized arguments: --remove_input_padding --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_rmsnorm_plugin float16 --enable_context_fmha --world_size 8 --use_inflight_batching --max_input_len 4096 --max_output_len 1024 --max_batch_size 8 --paged_kv_cache

These used to the part of build.py.

Are these being used by default while building engine files? What are their default values? I am majorly concerned about max_input_len, max_output_len, max_batch_size, paged_kv_cache.

Thanks!

NVIDIA / TensorRT-LLM

Migrating engine build process from 0.7.1 to 0.10.0 #1840