Can a T4 GPU Run LLava?

Hello,

I have a question regarding the compatibility and performance of the T4 GPU with LLava. I've been attempting to run the LLava example on a T4 GPU but encountered a memory insufficiency error. From my experience running it on other machines, it seems like the T4 GPU should have adequate memory for this task. I've already turned off ECC checking to ensure maximum available memory.

The specific error I encountered is as follows:

[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last):
  File "/TensorRT-LLM/examples/multimodal/run.py", line 365, in <module>
    model = MultiModalModel(args)
  File "/TensorRT-LLM/examples/multimodal/run.py", line 88, in __init__
    self.init_llm()
  File "/TensorRT-LLM/examples/multimodal/run.py", line 113, in init_llm
    self.model = ModelRunner.from_dir(self.args.llm_engine_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 450, in from_dir
    session = session_cls(model_config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 491, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 155, in __init__
    self.__prepare(mapping, engine_buffer)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 178, in __prepare
    address = CUASSERT(cudart.cudaMalloc(self.engine.device_memory_size))[0]
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 104, in CUASSERT
    raise RuntimeError(
RuntimeError: CUDA ERROR: 2, error code reference: https://nvidia.github.io/cuda-python/module/cudart.html#cuda.cudart.cudaError_t
Exception ignored in: <function _Runtime.__del__ at 0x7fbf2895a3b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 282, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

Could you please help me understand what might be causing this issue? Is there a specific memory requirement or compatibility issue with the T4 GPU when running LLava?

Thank you for your assistance.

NVIDIA / TensorRT-LLM

Can a T4 GPU Run LLava? #1024