TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
System Info
GPU: NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.3
Who can help?
@Pzzzzz5142 @fjosw @ami
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Start the service in this directory:sherpa/triton/whisper
Expected behavior
report bug without core.xxxx files
actual behavior
generate too many core.xxxx files, each file is 2.4G. If the number of abnormal requests increases, the disk may explode easily.
additional notes
Is there any setting that can prevent the generation of core.XXXX files?