Error when running llava on v0.13.0

zhangts20 commented 3 weeks ago

System Info

CPU: x86_64
GPU: H100
CUDA: 12.1
PyTorch: 2.3.1+cu121
TensorRT-LLM: 0.13.0
TensorRT: 10.4.0
Driver: 535.86.10

Who can help?

@ncomly-nvidia @byshiue

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Steps to reproduce the behavior:

model: https://huggingface.co/llava-hf/llava-1.5-7b-hf
build engine from https://github.com/NVIDIA/TensorRT-LLM/tree/v0.13.0/examples/multimodal#llava-llava-next-and-vila
run engine from https://github.com/NVIDIA/TensorRT-LLM/tree/v0.13.0/examples/multimodal#llava-llava-next-and-vila

Expected behavior

execute successfully

actual behavior

Traceback (most recent call last):
  File "/xxx/projects/TensorRT-LLM/examples/multimodal/run.py", line 132, in <module>
    input_text, output_text = model.run(args.input_text, raw_image,
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1163, in run
    output_text = self.generate(pre_prompt,
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 618, in generate
    output_ids = self.model.generate(
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 870, in generate
    outputs = self.session.decode(
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 1056, in wrapper
    ret = func(self, *args, **kwargs)
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3721, in decode
    self.__setup_decoder(input_ids, scfg, host_context_lengths)
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 1295, in __setup_decoder
    self.dynamic_decoder.setup(
RuntimeError: Tried to cast IValue to custom class but it did not contain a custom class!
corrupted size vs. prev_size
[18297d022429:479071] *** Process received signal ***
[18297d022429:479071] Signal: Aborted (6)
[18297d022429:479071] Signal code:  (-6)
[18297d022429:479071] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc6392aa520]
[18297d022429:479071] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fc6392fe9fc]
[18297d022429:479071] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fc6392aa476]
[18297d022429:479071] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fc6392907f3]
[18297d022429:479071] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x89676)[0x7fc6392f1676]
[18297d022429:479071] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0xa0cfc)[0x7fc639308cfc]
[18297d022429:479071] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0xa17e2)[0x7fc6393097e2]
[18297d022429:479071] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0xa2d2b)[0x7fc63930ad2b]
[18297d022429:479071] [ 8] /lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x7fc63930d453]
[18297d022429:479071] [ 9] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x19a013d)[0x7fc5897a013d]
[18297d022429:479071] [10] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x1b96edf)[0x7fc589996edf]
[18297d022429:479071] [11] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x19b6e62)[0x7fc5897b6e62]
[18297d022429:479071] [12] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x16d4f23)[0x7fc5894d4f23]
[18297d022429:479071] [13] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x10d7c90)[0x7fc588ed7c90]
[18297d022429:479071] [14] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x10c6bca)[0x7fc588ec6bca]
[18297d022429:479071] [15] /xxx/miniconda3/lib/python3.10/site-packages/tensorrt/tensorrt.so(+0x70dda)[0x7fc597470dda]
[18297d022429:479071] [16] /xxx/miniconda3/lib/python3.10/site-packages/tensorrt/tensorrt.so(+0x48c66)[0x7fc597448c66]
[18297d022429:479071] [17] /xxx/miniconda3/lib/python3.10/site-packages/tensorrt/tensorrt.so(+0x494fe)[0x7fc5974494fe]
[18297d022429:479071] [18] python[0x4ead66]
[18297d022429:479071] [19] python[0x50e114]
[18297d022429:479071] [20] python[0x4ead66]
[18297d022429:479071] [21] python[0x50e114]
[18297d022429:479071] [22] python[0x4ead66]
[18297d022429:479071] [23] python[0x50e114]
[18297d022429:479071] [24] python[0x4ead66]
[18297d022429:479071] [25] python[0x50e114]
[18297d022429:479071] [26] python[0x4dfcc2]
[18297d022429:479071] [27] python(_PyModule_ClearDict+0x14d)[0x55d1dd]
[18297d022429:479071] [28] python[0x5c7543]
[18297d022429:479071] [29] python(Py_FinalizeEx+0x143)[0x5c5fd3]
[18297d022429:479071] *** End of error message ***
run_infer.sh: line 14: 479071 Aborted                 (core dumped) python run.py --visual_engine_dir "$MODEL_DIR/llava-1.5-7b-hf-trt/vit" --llm_engine_dir "$MODEL_DIR/llava-1.5-7b-hf-trt" --hf_model_dir "$MODEL_DIR/llava-1.5-7b-hf" --input_text "please describe the picture" --image_path "../dit/figs/sample.png"

additional notes

no

amukkara commented 2 weeks ago

@zhangts20 Can you try running this model on 0.14 release? There was an issue with flash_attention in vision encoder that is fixed in 0.14. Can you also test with default image and input prompt to make it easier for us to reproduce the error.

zhangts20 commented 2 weeks ago