Unable to create Llama Engine with Lora Weights

System Info

GPU : A5000
CPU : x86_64

Who can help?

@byshiue please help

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I have a PEFT trained Llama model which i want to run via TensorRT-LLM. I am using below build command to create the engine.

python build.py --model_dir /mnt/hdd1/rajeevy/nlp/LLM/summarization/llama2-original                 --dtype bfloat16                  --use_gpt_attention_plugin bfloat16                 --enable_context_fmha                 --use_gemm_plugin bfloat16     --use_lora_plugin bfloat16            --output_dir /app/tensorrt_llm/examples/llama/7B/trt_engines/bfp16/2-gpu/      --hf_lora_dir /mnt/hdd1/rajeevy/nlp/LLM/summarization/llama2-16jan/checkpoint-24690  --remove_input_padding           --world_size 2                 --tp_size 2

Expected behavior

Model should have build without any error

actual behavior

I am getting below error:

Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/llama/build.py", line 983, in <module>
    build(0, args)
  File "/app/tensorrt_llm/examples/llama/build.py", line 927, in build
    engine = build_rank_engine(builder, builder_config, engine_name,
  File "/app/tensorrt_llm/examples/llama/build.py", line 836, in build_rank_engine
    inputs = tensorrt_llm_llama.prepare_inputs(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 513, in prepare_inputs
    model_inputs = self.prepare_basic_inputs(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/generation_mixin.py", line 521, in prepare_basic_inputs
    for lora_module in lora_target_modules:
TypeError: 'NoneType' object is not iterable

additional notes

I have build the docker container using make -C docker release_build command.

Contents of llama2-original folder:

-rw-rw-r-- 1 1002 1002  183 Jan 23 07:40 generation_config.json
-rw-rw-r-- 1 1002 1002 4.6G Jan 23 07:40 model-00001-of-00006.safetensors
-rw-rw-r-- 1 1002 1002 4.6G Jan 23 07:41 model-00002-of-00006.safetensors
-rw-rw-r-- 1 1002 1002 4.6G Jan 23 07:41 model-00003-of-00006.safetensors
-rw-rw-r-- 1 1002 1002 4.6G Jan 23 07:41 model-00004-of-00006.safetensors
-rw-rw-r-- 1 1002 1002 4.6G Jan 23 07:41 model-00005-of-00006.safetensors
-rw-rw-r-- 1 1002 1002 2.6G Jan 23 07:41 model-00006-of-00006.safetensors
-rw-rw-r-- 1 1002 1002  24K Jan 23 07:41 model.safetensors.index.json

NVIDIA / TensorRT-LLM

Unable to create Llama Engine with Lora Weights #938