TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
I have a question regarding the compatibility and performance of the T4 GPU with LLava. I've been attempting to run the LLava example on a T4 GPU but encountered a memory insufficiency error. From my experience running it on other machines, it seems like the T4 GPU should have adequate memory for this task. I've already turned off ECC checking to ensure maximum available memory.
The specific error I encountered is as follows:
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last):
File "/TensorRT-LLM/examples/multimodal/run.py", line 365, in <module>
model = MultiModalModel(args)
File "/TensorRT-LLM/examples/multimodal/run.py", line 88, in __init__
self.init_llm()
File "/TensorRT-LLM/examples/multimodal/run.py", line 113, in init_llm
self.model = ModelRunner.from_dir(self.args.llm_engine_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 450, in from_dir
session = session_cls(model_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 491, in __init__
self.runtime = _Runtime(engine_buffer, mapping)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 155, in __init__
self.__prepare(mapping, engine_buffer)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 178, in __prepare
address = CUASSERT(cudart.cudaMalloc(self.engine.device_memory_size))[0]
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 104, in CUASSERT
raise RuntimeError(
RuntimeError: CUDA ERROR: 2, error code reference: https://nvidia.github.io/cuda-python/module/cudart.html#cuda.cudart.cudaError_t
Exception ignored in: <function _Runtime.__del__ at 0x7fbf2895a3b0>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 282, in __del__
cudart.cudaFree(self.address) # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'
Could you please help me understand what might be causing this issue? Is there a specific memory requirement or compatibility issue with the T4 GPU when running LLava?
Hello,
I have a question regarding the compatibility and performance of the T4 GPU with LLava. I've been attempting to run the LLava example on a T4 GPU but encountered a memory insufficiency error. From my experience running it on other machines, it seems like the T4 GPU should have adequate memory for this task. I've already turned off ECC checking to ensure maximum available memory.
The specific error I encountered is as follows:
Could you please help me understand what might be causing this issue? Is there a specific memory requirement or compatibility issue with the T4 GPU when running LLava?
Thank you for your assistance.