NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.71k stars 996 forks source link

Error when running llava on v0.13.0 #2399

Closed zhangts20 closed 2 weeks ago

zhangts20 commented 3 weeks ago

System Info

Who can help?

@ncomly-nvidia @byshiue

Information

Tasks

Reproduction

Steps to reproduce the behavior:

  1. model: https://huggingface.co/llava-hf/llava-1.5-7b-hf
  2. build engine from https://github.com/NVIDIA/TensorRT-LLM/tree/v0.13.0/examples/multimodal#llava-llava-next-and-vila
  3. run engine from https://github.com/NVIDIA/TensorRT-LLM/tree/v0.13.0/examples/multimodal#llava-llava-next-and-vila

Expected behavior

execute successfully

actual behavior

Traceback (most recent call last):
  File "/xxx/projects/TensorRT-LLM/examples/multimodal/run.py", line 132, in <module>
    input_text, output_text = model.run(args.input_text, raw_image,
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1163, in run
    output_text = self.generate(pre_prompt,
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 618, in generate
    output_ids = self.model.generate(
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 870, in generate
    outputs = self.session.decode(
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 1056, in wrapper
    ret = func(self, *args, **kwargs)
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3721, in decode
    self.__setup_decoder(input_ids, scfg, host_context_lengths)
  File "/xxx/miniconda3/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 1295, in __setup_decoder
    self.dynamic_decoder.setup(
RuntimeError: Tried to cast IValue to custom class but it did not contain a custom class!
corrupted size vs. prev_size
[18297d022429:479071] *** Process received signal ***
[18297d022429:479071] Signal: Aborted (6)
[18297d022429:479071] Signal code:  (-6)
[18297d022429:479071] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc6392aa520]
[18297d022429:479071] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fc6392fe9fc]
[18297d022429:479071] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fc6392aa476]
[18297d022429:479071] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fc6392907f3]
[18297d022429:479071] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x89676)[0x7fc6392f1676]
[18297d022429:479071] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0xa0cfc)[0x7fc639308cfc]
[18297d022429:479071] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0xa17e2)[0x7fc6393097e2]
[18297d022429:479071] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0xa2d2b)[0x7fc63930ad2b]
[18297d022429:479071] [ 8] /lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x7fc63930d453]
[18297d022429:479071] [ 9] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x19a013d)[0x7fc5897a013d]
[18297d022429:479071] [10] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x1b96edf)[0x7fc589996edf]
[18297d022429:479071] [11] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x19b6e62)[0x7fc5897b6e62]
[18297d022429:479071] [12] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x16d4f23)[0x7fc5894d4f23]
[18297d022429:479071] [13] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x10d7c90)[0x7fc588ed7c90]
[18297d022429:479071] [14] /xxx/files/TensorRT-10.4.0.26/lib/libnvinfer.so.10(+0x10c6bca)[0x7fc588ec6bca]
[18297d022429:479071] [15] /xxx/miniconda3/lib/python3.10/site-packages/tensorrt/tensorrt.so(+0x70dda)[0x7fc597470dda]
[18297d022429:479071] [16] /xxx/miniconda3/lib/python3.10/site-packages/tensorrt/tensorrt.so(+0x48c66)[0x7fc597448c66]
[18297d022429:479071] [17] /xxx/miniconda3/lib/python3.10/site-packages/tensorrt/tensorrt.so(+0x494fe)[0x7fc5974494fe]
[18297d022429:479071] [18] python[0x4ead66]
[18297d022429:479071] [19] python[0x50e114]
[18297d022429:479071] [20] python[0x4ead66]
[18297d022429:479071] [21] python[0x50e114]
[18297d022429:479071] [22] python[0x4ead66]
[18297d022429:479071] [23] python[0x50e114]
[18297d022429:479071] [24] python[0x4ead66]
[18297d022429:479071] [25] python[0x50e114]
[18297d022429:479071] [26] python[0x4dfcc2]
[18297d022429:479071] [27] python(_PyModule_ClearDict+0x14d)[0x55d1dd]
[18297d022429:479071] [28] python[0x5c7543]
[18297d022429:479071] [29] python(Py_FinalizeEx+0x143)[0x5c5fd3]
[18297d022429:479071] *** End of error message ***
run_infer.sh: line 14: 479071 Aborted                 (core dumped) python run.py --visual_engine_dir "$MODEL_DIR/llava-1.5-7b-hf-trt/vit" --llm_engine_dir "$MODEL_DIR/llava-1.5-7b-hf-trt" --hf_model_dir "$MODEL_DIR/llava-1.5-7b-hf" --input_text "please describe the picture" --image_path "../dit/figs/sample.png"

additional notes

no

amukkara commented 2 weeks ago

@zhangts20 Can you try running this model on 0.14 release? There was an issue with flash_attention in vision encoder that is fixed in 0.14. Can you also test with default image and input prompt to make it easier for us to reproduce the error.

zhangts20 commented 2 weeks ago

@zhangts20 Can you try running this model on 0.14 release? There was an issue with flash_attention in vision encoder that is fixed in 0.14. Can you also test with default image and input prompt to make it easier for us to reproduce the error.

Thank you for your reply, I'll give it a try.