ModelRunnerCpp throws UnboundLocalError: local variable 'vocab_size' referenced before assignment

jxchenus commented 19 hours ago

System Info

TensorRT-LLM v0.13.0

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

The error is thrown from: https://github.com/NVIDIA/TensorRT-LLM/blob/v0.13.0/tensorrt_llm/runtime/model_runner_cpp.py#L795-L800

Expected behavior

Should first take the vocab size from the logits as done in here

actual behavior

The error is thrown from: https://github.com/NVIDIA/TensorRT-LLM/blob/v0.13.0/tensorrt_llm/runtime/model_runner_cpp.py#L795-L800

Here's the stack: Traceback (most recent call last): File "/opt/amazon/alexa_triton_inference_engine/lib/python3.10/site-packages/nemort_triton_trtllm_inference_server/models/agm/model.py", line 248, in execute outputs = self.runner.generate( File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/tensorrt_llm/runtime/model_runner_cpp.py", line 606, in generate return self._initialize_and_fill_output( File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/tensorrt_llm/runtime/model_runner_cpp.py", line 678, in _initialize_and_fill_output return self._fill_output(responses, output_ids, end_id, return_dict, File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/tensorrt_llm/runtime/model_runner_cpp.py", line 800, in _fill_output gen_shape = (num_beams, max_new_tokens, vocab_size)

additional notes

N/A.

jxchenus commented 14 hours ago

I'm only able to repro using Triton Server Python backend with ModelRunnerCpp.

But the fix is pretty straight-forward: Just need to move the few problematic lines (https://github.com/NVIDIA/TensorRT-LLM/blob/v0.13.0/tensorrt_llm/runtime/model_runner_cpp.py#L795-L800) after vocab_size is assigned.

jxchenus commented 13 hours ago

Here's a patch which fixes the issue:

fb165e6..1e3fb80.diff.txt

DanBlanaru commented 1 hour ago

could you please provide a simple reproducer for this issue?

We would of course be happy to include your fixes and credit you appropriately, but we need to be able to reproduce the issue.

Thank you!

NVIDIA / TensorRT-LLM