After deployment, each request exception generates a core.xxxx file

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache License 2.0

8.11k stars 896 forks source link

System Info

GPU: NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.3

Who can help?

@Pzzzzz5142 @fjosw @ami

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Start the service in this directory：sherpa/triton/whisper

Expected behavior

report bug without core.xxxx files

actual behavior

generate too many core.xxxx files, each file is 2.4G. If the number of abnormal requests increases, the disk may explode easily.

additional notes

Is there any setting that can prevent the generation of core.XXXX files?

NVIDIA / TensorRT-LLM