Cannot build Nougat model

System Info

RTX 4090
x86_64 GNU/Linux
main branch

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Following the instructions for Nougat here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal#nougat

Error happens during the build.py step.

python ../enc_dec/build.py \
    --model_type bart \
    --weight_dir tmp/trt_models/${MODEL_NAME}/tp1 \
    -o trt_engines/${MODEL_NAME}/1-gpu \
    --engine_name $MODEL_NAME \
    --bert_attention_plugin \
    --use_gpt_attention_plugin \
    --use_gemm_plugin \
    --dtype bfloat16 \
    --max_beam_width 1 \
    --max_batch_size 1 \
    --nougat \
    --max_output_len 100 \
    --max_multimodal_len 588

Expected behavior

Model successfully builds

actual behavior

[02/15/2024-19:53:13] [TRT-LLM] [W] Skipping build of encoder for Nougat model
[02/15/2024-19:53:13] [TRT-LLM] [I] Setting model configuration from tmp/trt_models/nougat-small/tp1.
[02/15/2024-19:53:13] [TRT-LLM] [I] use_bert_attention_plugin set, without specifying a value. Using bfloat16 automatically.
[02/15/2024-19:53:13] [TRT-LLM] [I] use_gpt_attention_plugin set, without specifying a value. Using bfloat16 automatically.
[02/15/2024-19:53:13] [TRT-LLM] [I] use_gemm_plugin set, without specifying a value. Using bfloat16 automatically.
[02/15/2024-19:53:13] [TRT-LLM] [W] Forcing max_encoder_input_len equal to max_prompt_embedding_table_size
[02/15/2024-19:53:13] [TRT-LLM] [I] Serially build TensorRT engines.
[02/15/2024-19:53:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 121, GPU 404 (MiB)
[02/15/2024-19:53:14] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +316, now: CPU 2066, GPU 720 (MiB)
[02/15/2024-19:53:14] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[02/15/2024-19:53:14] [TRT-LLM] [I] Loading weights from binary...
[02/15/2024-19:53:14] [TRT-LLM] [I] Weights loaded. Total time: 00:00:00
Traceback (most recent call last):
  File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 574, in <module>
    run_build(component='decoder')
  File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 565, in run_build
    build(0, args)
  File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 509, in build
    engine = build_rank_engine(builder, builder_config, engine_name,
  File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 402, in build_rank_engine
    network.plugin_config.to_legacy_setting()
AttributeError: 'PluginConfig' object has no attribute 'to_legacy_setting'

additional notes

It looks like the to_legacy_settings() method doesn't exist in the builder class.

NVIDIA / TensorRT-LLM

Cannot build Nougat model #1088

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes