Device CUDA update, caused model to stop running.

System Info

CPU: x86_64
GPU: H100
CUDA: 12.4

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

We built our model using tritonserver container 24.10, the base server's CUDA was at 12.4. We are using a cloud provider for our GPU infra, and they updated their CUDA version to 12.7, and our trt-build model stopped working (the error said CUDA mismatch).

But we are still using tritonserver 24.10, so it shouldn't matter? If we use triton 24.10 in compatibility mode with CUDA 12.4, 12.5 and 12.7, do we need 3 different trt builds? And a fourth for 12.6? Is triton really that sensitive to the CUDA version?

Expected behavior

If triton container version is same, and GPU config is same, it should just work.

actual behavior

lauch_triton_server fails.

additional notes

Triton server container version: 24.10 GPU: H100 SXM 80GB Base server CUDA during built: 12.4 server CUDA after update 12.7

NVIDIA / TensorRT-LLM