NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.19k stars 908 forks source link

Getting error while running run.py from whisper examples #788

Open sasikr2 opened 8 months ago

sasikr2 commented 8 months ago

Traceback (most recent call last): File "/code/tensorrt_llm/examples/whisper/run.py", line 336, in model = WhisperTRTLLM(args.engine_dir, args.debug, args.assets_dir) File "/code/tensorrt_llm/examples/whisper/run.py", line 218, in init self.decoder = WhisperDecoding(engine_dir, File "/code/tensorrt_llm/examples/whisper/run.py", line 121, in init self.decoder_generation_session = self.get_session( File "/code/tensorrt_llm/examples/whisper/run.py", line 153, in get_session decoder_generation_session = tensorrt_llm.runtime.GenerationSession( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 457, in init self.runtime = _Runtime(engine_buffer, mapping) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 150, in init self.prepare(mapping, engine_buffer) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 170, in prepare address = CUASSERT(cudart.cudaMalloc(self.engine.device_memory_size))[0] File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 99, in CUASSERT raise RuntimeError( RuntimeError: CUDA ERROR: 2, error code reference: https://nvidia.github.io/cuda-python/module/cudart.html#cuda.cudart.cudaError_t Exception ignored in: <function _Runtime.del at 0x7fb94d845360> Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 266, in del cudart.cudaFree(self.address) # FIXME: cudaFree is None?? AttributeError: '_Runtime' object has no attribute 'address'

jdemouth-nvidia commented 8 months ago

Hi. Could you share the command line to reproduce the issue, please? As well as tell us more about your environment, please? Thanks a lot.

sasikr2 commented 8 months ago

First I created model.engine by running command python3 build.py --output_dir whisper_large_v2_bm5 --model_name large-v2 --max_batch_size 16 --max_beam_width 5 --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin after this I simply run python3 run.py

Env: Running on A100 card (40GB)