NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.85k stars 2.14k forks source link

Segmentation fault TensorRT 10.3 (and older versions) with GCC13 #4173

Open jokla opened 1 month ago

jokla commented 1 month ago

Description

Tensorrt seg fault when parsing an ONNX model ( yolov8 QAT) with gcc13 installed in Ubuntu 22.04.

Environment

TensorRT Version: 10.3 or olders

NVIDIA GPU: NVIDIA RTX A6000

NVIDIA Driver Version: 560.28.03

CUDA Version: 12.6

CUDNN Version: 8.9.6.50-1+cuda12.2

Operating System:

Container : ubuntu-22.04.Dockerfile + gcc 13 installed

# Install GCC 13
ARG GCC_VERSION=13
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt update && apt install g++-"$GCC_VERSION" gcc-"$GCC_VERSION" -y && apt clean
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" "$GCC_VERSION" \
    --slave /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" \
    --slave /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION"

Same issue with nvcr.io/nvidia/tensorrt:24.08-py3 with gcc13 installed on top of it.

Relevant Files

Steps To Reproduce

Commands or scripts:

(gdb) run --onnx=../../data/yolov8_qat.onnx --best
Starting program: /workspace/TensorRT/build/out/trtexec_debug --onnx=../../data/yolov8_qat.onnx --best
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /workspace/TensorRT/build/out/trtexec_debug --onnx=../../data/yolov8_qat.onnx --best
[New Thread 0x75caf7600000 (LWP 12728)]
[09/30/2024-15:15:57] [I] === Model Options ===
[09/30/2024-15:15:57] [I] Format: ONNX
[09/30/2024-15:15:57] [I] Model: ../../data/yolov8_qat.onnx
[09/30/2024-15:15:57] [I] Output:
[09/30/2024-15:15:57] [I] === Build Options ===
[09/30/2024-15:15:57] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[09/30/2024-15:15:57] [I] avgTiming: 8
[09/30/2024-15:15:57] [I] Precision: FP32+FP16+BF16+INT8
[09/30/2024-15:15:57] [I] LayerPrecisions: 
[09/30/2024-15:15:57] [I] Layer Device Types: 
[09/30/2024-15:15:57] [I] Calibration: Dynamic
[09/30/2024-15:15:57] [I] Refit: Disabled
[09/30/2024-15:15:57] [I] Strip weights: Disabled
[09/30/2024-15:15:57] [I] Version Compatible: Disabled
[09/30/2024-15:15:57] [I] ONNX Plugin InstanceNorm: Disabled
[09/30/2024-15:15:57] [I] TensorRT runtime: full
[09/30/2024-15:15:57] [I] Lean DLL Path: 
[09/30/2024-15:15:57] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/30/2024-15:15:57] [I] Exclude Lean Runtime: Disabled
[09/30/2024-15:15:57] [I] Sparsity: Disabled
[09/30/2024-15:15:57] [I] Safe mode: Disabled
[09/30/2024-15:15:57] [I] Build DLA standalone loadable: Disabled
[09/30/2024-15:15:57] [I] Allow GPU fallback for DLA: Disabled
[09/30/2024-15:15:57] [I] DirectIO mode: Disabled
[09/30/2024-15:15:57] [I] Restricted mode: Disabled
[09/30/2024-15:15:57] [I] Skip inference: Disabled
[09/30/2024-15:15:57] [I] Save engine: 
[09/30/2024-15:15:57] [I] Load engine: 
[09/30/2024-15:15:57] [I] Profiling verbosity: 0
[09/30/2024-15:15:57] [I] Tactic sources: Using default tactic sources
[09/30/2024-15:15:57] [I] timingCacheMode: local
[09/30/2024-15:15:57] [I] timingCacheFile: 
[09/30/2024-15:15:57] [I] Enable Compilation Cache: Enabled
[09/30/2024-15:15:57] [I] errorOnTimingCacheMiss: Disabled
[09/30/2024-15:15:57] [I] Preview Features: Use default preview flags.
[09/30/2024-15:15:57] [I] MaxAuxStreams: -1
[09/30/2024-15:15:57] [I] BuilderOptimizationLevel: -1
[09/30/2024-15:15:57] [I] Calibration Profile Index: 0
[09/30/2024-15:15:57] [I] Weight Streaming: Disabled
[09/30/2024-15:15:57] [I] Runtime Platform: Same As Build
[09/30/2024-15:15:57] [I] Debug Tensors: 
[09/30/2024-15:15:57] [I] Input(s)s format: fp32:CHW
[09/30/2024-15:15:57] [I] Output(s)s format: fp32:CHW
[09/30/2024-15:15:57] [I] Input build shapes: model
[09/30/2024-15:15:57] [I] Input calibration shapes: model
[09/30/2024-15:15:57] [I] === System Options ===
[09/30/2024-15:15:57] [I] Device: 0
[09/30/2024-15:15:57] [I] DLACore: 
[09/30/2024-15:15:57] [I] Plugins:
[09/30/2024-15:15:57] [I] setPluginsToSerialize:
[09/30/2024-15:15:57] [I] dynamicPlugins:
[09/30/2024-15:15:57] [I] ignoreParsedPluginLibs: 0
[09/30/2024-15:15:57] [I] 
[09/30/2024-15:15:57] [I] === Inference Options ===
[09/30/2024-15:15:57] [I] Batch: Explicit
[09/30/2024-15:15:57] [I] Input inference shapes: model
[09/30/2024-15:15:57] [I] Iterations: 10
[09/30/2024-15:15:57] [I] Duration: 3s (+ 200ms warm up)
[09/30/2024-15:15:57] [I] Sleep time: 0ms
[09/30/2024-15:15:57] [I] Idle time: 0ms
[09/30/2024-15:15:57] [I] Inference Streams: 1
[09/30/2024-15:15:57] [I] ExposeDMA: Disabled
[09/30/2024-15:15:57] [I] Data transfers: Enabled
[09/30/2024-15:15:57] [I] Spin-wait: Disabled
[09/30/2024-15:15:57] [I] Multithreading: Disabled
[09/30/2024-15:15:57] [I] CUDA Graph: Disabled
[09/30/2024-15:15:57] [I] Separate profiling: Disabled
[09/30/2024-15:15:57] [I] Time Deserialize: Disabled
[09/30/2024-15:15:57] [I] Time Refit: Disabled
[09/30/2024-15:15:57] [I] NVTX verbosity: 0
[09/30/2024-15:15:57] [I] Persistent Cache Ratio: 0
[09/30/2024-15:15:57] [I] Optimization Profile Index: 0
[09/30/2024-15:15:57] [I] Weight Streaming Budget: 100.000000%
[09/30/2024-15:15:57] [I] Inputs:
[09/30/2024-15:15:57] [I] Debug Tensor Save Destinations:
[09/30/2024-15:15:57] [I] === Reporting Options ===
[09/30/2024-15:15:57] [I] Verbose: Disabled
[09/30/2024-15:15:57] [I] Averages: 10 inferences
[09/30/2024-15:15:57] [I] Percentiles: 90,95,99
[09/30/2024-15:15:57] [I] Dump refittable layers:Disabled
[09/30/2024-15:15:57] [I] Dump output: Disabled
[09/30/2024-15:15:57] [I] Profile: Disabled
[09/30/2024-15:15:57] [I] Export timing to JSON file: 
[09/30/2024-15:15:57] [I] Export output to JSON file: 
[09/30/2024-15:15:57] [I] Export profile to JSON file: 
[09/30/2024-15:15:57] [I] 
[09/30/2024-15:15:57] [I] === Device Information ===
[09/30/2024-15:15:57] [I] Available Devices: 
[09/30/2024-15:15:57] [I]   Device 0: "NVIDIA RTX A6000" UUID: GPU-f046bca2-ca31-632c-bd28-bfde07884c2d
[New Thread 0x75caf5a00000 (LWP 12729)]
[New Thread 0x75caf5000000 (LWP 12730)]
[09/30/2024-15:15:57] [I] Selected Device: NVIDIA RTX A6000
[09/30/2024-15:15:57] [I] Selected Device ID: 0
[09/30/2024-15:15:57] [I] Selected Device UUID: GPU-f046bca2-ca31-632c-bd28-bfde07884c2d
[09/30/2024-15:15:57] [I] Compute Capability: 8.6
[09/30/2024-15:15:57] [I] SMs: 84
[09/30/2024-15:15:57] [I] Device Global Memory: 48567 MiB
[09/30/2024-15:15:57] [I] Shared Memory per SM: 100 KiB
[09/30/2024-15:15:57] [I] Memory Bus Width: 384 bits (ECC disabled)
[09/30/2024-15:15:57] [I] Application Compute Clock Rate: 1.8 GHz
[09/30/2024-15:15:57] [I] Application Memory Clock Rate: 8.001 GHz
[09/30/2024-15:15:57] [I] 
[09/30/2024-15:15:57] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/30/2024-15:15:57] [I] 
[09/30/2024-15:15:57] [I] TensorRT version: 10.3.0
[09/30/2024-15:15:57] [I] Loading standard plugins
[09/30/2024-15:15:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 18, GPU 2913 (MiB)
[09/30/2024-15:15:59] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2088, GPU +386, now: CPU 2261, GPU 3299 (MiB)
[09/30/2024-15:15:59] [I] Start parsing network model.
[09/30/2024-15:15:59] [I] [TRT] ----------------------------------------------------------------
[09/30/2024-15:15:59] [I] [TRT] Input filename:   ../../data/yolov8_qat.onnx
[09/30/2024-15:15:59] [I] [TRT] ONNX IR version:  0.0.8
[09/30/2024-15:15:59] [I] [TRT] Opset version:    17
[09/30/2024-15:15:59] [I] [TRT] Producer name:    pytorch
[09/30/2024-15:15:59] [I] [TRT] Producer version: 2.1.1
[09/30/2024-15:15:59] [I] [TRT] Domain:           
[09/30/2024-15:15:59] [I] [TRT] Model version:    0
[09/30/2024-15:15:59] [I] [TRT] Doc string:       
[09/30/2024-15:15:59] [I] [TRT] ----------------------------------------------------------------
[09/30/2024-15:16:00] [I] Finished parsing network model. Parse time: 0.66842
[09/30/2024-15:16:00] [W] [TRT] Calibrator won't be used in explicit quantization mode. Please insert Quantize/Dequantize layers to indicate which tensors to quantize/dequantize.
[09/30/2024-15:16:00] [W] [TRT] /Reshape_19: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[09/30/2024-15:16:00] [W] [TRT] /Reshape_22: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[09/30/2024-15:16:00] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.

Thread 1 "trtexec_debug" received signal SIGSEGV, Segmentation fault.
0x000075caf9d0116e in ?? () from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
(gdb) bt
#0  0x000075caf9d0116e in ?? () from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
#1  0x000075caf9d01d5a in _Unwind_Find_FDE () from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
#2  0x000075caf9cfd60a in ?? () from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x000075caf9cff07d in _Unwind_RaiseException () from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
#4  0x000075caf9ea705b in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x000075cab94349e6 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#6  0x000075cab9f8cfcf in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#7  0x000075cab9ac5937 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#8  0x000075cab9aa5be4 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#9  0x000075cab9aad8ac in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#10 0x000075cab9aafab5 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#11 0x000075cab99c5c8c in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#12 0x000075cab99cb06a in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#13 0x000075cab99cbab5 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.10
#14 0x00005ec2858921f9 in nvinfer1::IBuilder::buildSerializedNetwork (this=0x5ec291b9f3f0, network=..., config=...) at /workspace/TensorRT/include/NvInfer.h:9812
#15 0x00005ec28588bb84 in sample::networkToSerializedEngine (build=..., sys=..., builder=..., env=..., err=...) at /workspace/TensorRT/samples/common/sampleEngines.cpp:1209
#16 0x00005ec28588c545 in sample::modelToBuildEnv (model=..., build=..., sys=..., env=..., err=...) at /workspace/TensorRT/samples/common/sampleEngines.cpp:1293
#17 0x00005ec28588dd37 in sample::getEngineBuildEnv (model=..., build=..., sys=..., env=..., err=...) at /workspace/TensorRT/samples/common/sampleEngines.cpp:1477
#18 0x00005ec28594bcec in main (argc=3, argv=0x7fff310075f8) at /workspace/TensorRT/samples/trtexec/trtexec.cpp:327

It seems that the issue is coming from libnvinfer.so.10 and gcc13. The TRT open source version uses a prebuilt nvinfer (from https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz) , possibly compiled with an older gcc (gcc 8 looking at this table ). The conversion is working on an Orin with Jetpack 6 ( probably because TRT is build with a newer gcc version).

How can I make TRT (and libnvinfer) compatible with gcc13? Also, is there a specific reason why it's only built with an old version of gcc?

Many thanks!

Have you tried the latest release?: Yes, same issue

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes

jackcaron commented 1 month ago

I have the same error when trying to build a TRT file from an ONNX file. On the same OS version, driver version 560.35.03 and using TensorRT 10.4.

Also, when I'm trying to load a TRT file that was built last week, before upgrading to GCC 13, and I'm getting this stack trace:

#0  0x00007ffff7ea516e in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#1  0x00007ffff7ea5d5a in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#2  0x00007ffff7ea160a in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x00007ffff7ea307d in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#4  0x00007ffff7cb705b in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fffbec3c0e5 in ?? () from /usr/lib/x86_64-linux-gnu//libnvinfer.so.10
#6  0x00007fffbecf24c6 in ?? () from /usr/lib/x86_64-linux-gnu//libnvinfer.so.10
#7  0x00007fffbecf41ac in ?? () from /usr/lib/x86_64-linux-gnu//libnvinfer.so.10
#8  0x00007fffbecf53ca in ?? () from /usr/lib/x86_64-linux-gnu//libnvinfer.so.10
#9  0x00007fffd32d6e7a in std::default_delete<nvinfer1::IRuntime>::operator() (this=0x7fffad7fdc58, __ptr=0x7fff740040b0) at /usr/include/c++/13/bits/unique_ptr.h:99
#10 0x00007fffd32d5cae in std::unique_ptr<nvinfer1::IRuntime, std::default_delete<nvinfer1::IRuntime> >::~unique_ptr (this=0x7fffad7fdc58, __in_chrg=<optimized out>)
    at /usr/include/c++/13/bits/unique_ptr.h:404
#11 0x00007fffd32d2919 in loadCudaEngine (trtPath="my_file.trt", logger=warning: RTTI symbol not found for class 'TensorRTLogger'
...)
    at loading_trt.cpp:170

This happens when deallocating a IRuntime after deserializeCudaEngine was called, otherwise it can be deleted (don't know if anything changes in IRuntime, but before upgrading to gcc-13, this was working fine, I also tried keeping the IRuntime alive a bit longer, but it just crashes further in the process).

When using memcheck, matching with this stack trace above, there's this message:

Use of uninitialised value of size 8

Edit 1

I was able to fix my bug above my keeping the IRunTime alive longer than the engine. Then I had a secondary logic bug (which is why the first time I tried that it didn't work). But this wasn't needed before, or it just kept going.

As for creating the TRT file from and an .onnx file, the crash above for me only happens if config->setBuilderOptimizationLevel(5); is called. Ignored, or any optimization level below 5 prevents it from crashing.

Hope this helps diagnosing the problem.

yuanyao-nv commented 1 month ago

Please see here for the supported GCC versions on each platform https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#software-version-platform

zeroepoch commented 1 month ago

Does setting -D_GLIBCXX_USE_CXX11_ABI=1 help at all to resolve this issue?

jokla commented 1 week ago

@zeroepoch I built TRT with GLIBCXX_USE_CXX11_ABI=1, but I'm still experiencing segmentation faults with nvinfer—both when converting a model from ONNX using trtexec and during inference in holoinfer (holoscan operator using tensorrt). I’m a bit puzzled about how setting GLIBCXX_USE_CXX11_ABI=1 could help with this. I thought that, by default, GCC 5 and later versions use CXX11_ABI=1, and since trt is built with GCC 8, it shouldn’t be an issue?

zeroepoch commented 1 week ago

@zeroepoch I thought that, by default, GCC 5 and later versions use CXX11_ABI=1, and since trt is built with GCC 8, it shouldn’t be an issue?

We force the older C++ ABI to increase compatibility with RHEL 7. That will be changing in a future release.

zeroepoch commented 1 week ago

Could you try the latest release, TRT 10.6? We've officially supported Ubuntu 24.04 with GCC 13 the last few TRT releases.

jokla commented 1 week ago

Hi @zeroepoch! Many thanks for your support!

I created a repo with instructions on how to replicate the issue: https://github.com/jokla/trt_gcc13.

I added a vanilla YOLOv8n ONNX model from Ultralytics. It was generated with:

!pip install ultralytics

from ultralytics import YOLO
model = YOLO("yolov8n.pt")
success = model.export(format="onnx")

I get a segmentation fault when I try to parse the model with FP16, reaching CaskConvolution[0x80000009].

Log before segmentation fault

[11/10/2024-17:27:44] [V] [TRT] /models.0/backbone/backbone/dark2/dark2.1/m/m.0/conv2/conv/Conv + PWN(PWN(PWN(/models.0/backbone/backbone/dark2/dark2.1/m/m.0/conv2/act/Sigmoid), PWN(/models.0/backbone/backbone/dark2/dark2.1/m/m.0/conv2/act/Mul)), PWN(/models.0/backbone/backbone/dark2/dark2.1/m/m.0/Add)) (CaskConvolution[0x80000009]) profiling completed in 0.370127 seconds. Fastest Tactic: 0x0866ddee325d07a6 Time: 0.0348142

However, I discovered that trtexec also crashes when I run it with an incorrect parameter like trtexec --test.

Tested the following:

I don't think we can easily move to Ubuntu 24.04 since we are using Nvidia Holoscan, so I have tried to avoid installing GCC 13 from apt (Ubuntu 22.04 version is GCC 13.1). Instead, I tried to build GCC 13.2 from source and use it on the tensorrt:24.8 image with TensorRT updated to 10.6. It looks like it is working for now, but having to build GCC from scratch is not ideal only because of TRT.

I am not sure why TRT is not happy about GCC 13.1 installed by apt. I haven't found a reason yet. Maybe there is something that got fixed in gcc 13.2? This is the list: https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=13.2

zeroepoch commented 6 days ago

Hi @jokla,

I was able to reproduce your problem. Thank you for the very detailed repo step! I'm not exactly sure where the problem is being introduced, but I can speculate that it's due to libgcc or libstdc++ being upgraded as part of the GCC 13.1 install. Best I can tell there is some breakage with the previously compiled trtexec and these new libraries. I didn't try installing the libstdc++ binary from GCC 13.2 to confirm, so it's still speculation.

I was able to find a workaround by rebuilding trtexec. Both the invalid argument case and the original model you're trying to convert work without crashing. I added to your existing Docker container with the following Dockerfile.

FROM trt_10_6_24_10_gcc13

ENV DEBIAN_FRONTEND=noninteractive

RUN make -C /workspace/tensorrt/samples clean
RUN make -C /workspace/tensorrt/samples samples=trtexec
RUN cp -f /workspace/tensorrt/bin/trtexec /opt/tensorrt/bin/trtexec

Within this new container the following commands work:

docker run --gpus all -it --rm -v ./data:/data trt_10_6_24_10_gcc13_rebuild /usr/bin/bash -c "trtexec --onnx=/data/yolov8n.onnx --fp16 --verbose"
docker run --gpus all -it --rm  trt_10_6_24_10_gcc13_rebuild /usr/bin/bash -c "trtexec --test && pwd"

I want to also mention that 24.11, which will be released in a week or so will be based on Ubuntu 24.04, so it will have GCC 13.2 as you mentioned. Maybe this will help for your Holoscan situation?

jokla commented 4 days ago

Hi @zeroepoch ! Thanks for the update.

I tried to build trtexec as you suggested, the trtexec --test is working but the conversion is still crashing for me:

#2  __GI___pthread_kill (threadid=135588370841600, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007b5120619476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007b51205ff7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007b5120af96fd in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6  0x00007b5120b0e857 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007b5120b1007d in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007b5120cb805b in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007b50e878f0be in ?? () from /lib/x86_64-linux-gnu/libnvinfer.so.10

Could you confirm that it is actually working for you?

docker run --gpus all -it --rm -v ./data:/data trt_10_6_24_10_gcc13_rebuild /usr/bin/bash -c "trtexec --onnx=/data/yolov8n.onnx --fp16 --verbose"

With this command, you don't get a segmentation fault message because it exits before printing anything to the terminal. If you add any command afterward, it will show a segmentation fault (at least for me).

docker run --gpus all -it --rm -v ./data:/data trt_10_6_24_10_gcc13_rebuild /usr/bin/bash -c "trtexec --onnx=/data/yolov8n.onnx --fp16 --verbose && ls"

Many thanks for your support!

zeroepoch commented 2 days ago

Hi @jokla,

When running this command:

docker run --gpus all -it --rm -v ./data:/data trt_10_6_24_10_gcc13_rebuild /usr/bin/bash -c "trtexec --onnx=/data/yolov8n.onnx --fp16 --verbose"

It ends with:

[11/23/2024-07:29:13] [V] [TRT] /model.2/m.0/cv2/conv/Conv + PWN(PWN(PWN(/model.2/m.0/cv2/act/Sigmoid), PWN(/model.2/m.0/cv2/act/Mul)), PWN(/model.2/m.0/Add)) (CaskConvolution[0x80000009]) profiling completed in 0.6333 seconds. Fastest Tactic: 0xa5a46bfbd719d757 Time: 0.00910743

When running this command:

docker run --gpus all -it --rm -v ./data:/data trt_10_6_24_10_gcc13_rebuild /usr/bin/bash -c "trtexec --onnx=/data/yolov8n.onnx --fp16 --verbose && ls"

It ends with:

[11/23/2024-07:31:07] [V] [TRT] /model.2/m.0/cv2/conv/Conv + PWN(PWN(PWN(/model.2/m.0/cv2/act/Sigmoid), PWN(/model.2/m.0/cv2/act/Mul)), PWN(/model.2/m.0/Add)) (CaskConvolution[0x80000009]) profiling completed in 0.659246 seconds. Fastest Tactic: 0x69501656100171de Time: 0.00914057
/usr/bin/bash: line 1:   120 Aborted                 (core dumped) trtexec --onnx=/data/yolov8n.onnx --fp16 --verbose

As you mentioned it segfaults at the end. I wasn't seeing it before, but probably because the container ends before the error gets printed. I'll need to investigate further. Based on the backtrace it looks like a similar issue as before when trtexec was precompiled, which means recompiling trtexec results in the same problem eventually.

zeroepoch commented 2 days ago

Since trtexec works with both the default compiler from the 24.10 release and in an Ubuntu 24.04 container with its default compiler, I would have to agree with your observation that there is some compiler issue. I don't think TensorRT can resolve a compatibility issue with a particular version of GCC. Let's say we compiled TensorRT with GCC 13.1, it might not work with the default compiler in Ubuntu 22.04 or 24.04 anymore. I haven't tried this, but if this hypothesis is correct, then the solution here is to update the compiler from GCC 13.1 to one without a cross-version compatibility issue.