NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.61k stars 2.11k forks source link

Memory leak on TX1 with dynamic input shape #1662

Closed xonobo closed 2 years ago

xonobo commented 2 years ago

My code calls TRT inference for an onnx model with dynamic input shape. The shape of the input to model vary at almost every inference call. I can see the memory usage increase on jtop and when I leave the inference call to an infinite loop it takes a few hours to crash due to out of memory.

I tried cuda-memcheck but it returns 0 leak and 0 error. I also tried valgrind and the definite leakages occurred related to TRT are

==4904== 56 bytes in 1 blocks are definitely lost in loss record 744 of 2,378 ==4904== at 0x48461D0: malloc (vg_replace_malloc.c:381) ==4904== by 0x434E6827: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434DE1A3: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x430BD637: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434DA8CF: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434D8C07: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434D8D3F: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434DFA1B: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434E015B: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434D4F77: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x43058A63: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x4304FBEF: __cuda_CallJitEntryPoint (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3)

==4904== 5,909 (72 direct, 5,837 indirect) bytes in 1 blocks are definitely lost in loss record 2,089 of 2,378 ==4904== at 0x48461D0: malloc (vg_replace_malloc.c:381) ==4904== by 0x434E6827: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x434DE1A3: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x430BD6EB: nvPTXCompilerCreate (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.32.4.3) ==4904== by 0x2C123817: fatBinaryCtl_Compile_WithJITDir (in /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.4.3) ==4904== by 0x2B30B61F: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1) ==4904== by 0x2B30C91F: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1) ==4904== by 0x2B2B446B: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1) ==4904== by 0x63CCF27: ??? (in /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcudart.so.10.2.89)

Here is my TX1 info:

NVIDIA Jetson TX1 - Jetpack 4.4 [L4T 32.4.3] Jetpack: 4.4 [L4T 32.4.3] Type: TX1 SOC Family: tegra210 ID: 33 Module: P2597-2180 Board: UNKNOWN Code Name: jetson Cuda ARCH: 5.3 CUDA: 10.2.89 TensorRT: 7.1.3.0 cuDNN: 8.0.0.180

ttyio commented 2 years ago

@xonobo , the leak come from compiler from the call stack. have you tried Jetpack 4.6? thanks

ttyio commented 2 years ago

close since no activity for more than 3 weeks, please reopen if you still have question, thanks!