NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.66k stars 989 forks source link

Can a T4 GPU Run LLava? #1024

Open zyan-repository opened 9 months ago

zyan-repository commented 9 months ago

Hello,

I have a question regarding the compatibility and performance of the T4 GPU with LLava. I've been attempting to run the LLava example on a T4 GPU but encountered a memory insufficiency error. From my experience running it on other machines, it seems like the T4 GPU should have adequate memory for this task. I've already turned off ECC checking to ensure maximum available memory.

The specific error I encountered is as follows:

[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last):
  File "/TensorRT-LLM/examples/multimodal/run.py", line 365, in <module>
    model = MultiModalModel(args)
  File "/TensorRT-LLM/examples/multimodal/run.py", line 88, in __init__
    self.init_llm()
  File "/TensorRT-LLM/examples/multimodal/run.py", line 113, in init_llm
    self.model = ModelRunner.from_dir(self.args.llm_engine_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 450, in from_dir
    session = session_cls(model_config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 491, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 155, in __init__
    self.__prepare(mapping, engine_buffer)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 178, in __prepare
    address = CUASSERT(cudart.cudaMalloc(self.engine.device_memory_size))[0]
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 104, in CUASSERT
    raise RuntimeError(
RuntimeError: CUDA ERROR: 2, error code reference: https://nvidia.github.io/cuda-python/module/cudart.html#cuda.cudart.cudaError_t
Exception ignored in: <function _Runtime.__del__ at 0x7fbf2895a3b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 282, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

Could you please help me understand what might be causing this issue? Is there a specific memory requirement or compatibility issue with the T4 GPU when running LLava?

Thank you for your assistance.

hello-11 commented 17 hours ago

@zyan-repository Do you still have the problem? If not, we will close it soon.