cyrusbehr / tensorrt-cpp-api

TensorRT C++ API Tutorial
MIT License
577 stars 72 forks source link

Segmentation Fault Error after initial successful run #1

Closed vanguard478 closed 1 year ago

vanguard478 commented 2 years ago

I am able to build this project, but running the inference after the engine has been generated throws a segmentation fault error

vanguard@vanguard-jetson:~/dev/tensorrt-cpp-api/build$ make -j$(nproc)
[ 25%] Building CXX object CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o
/home/vanguard/dev/tensorrt-cpp-api/src/engine.cpp: In member function ‘bool Engine::build(std::__cxx11::string)’:
/home/vanguard/dev/tensorrt-cpp-api/src/engine.cpp:81:16: warning: unused variable ‘output’ [-Wunused-variable]
     const auto output = network->getOutput(0);
                ^~~~~~
[ 50%] Linking CXX shared library libtensorrt_cpp_api.so
[ 50%] Built target tensorrt_cpp_api
[ 75%] Building CXX object CMakeFiles/driver.dir/src/main.cpp.o
[100%] Linking CXX executable driver
[100%] Built target driver
vanguard@vanguard-jetson:~/dev/tensorrt-cpp-api/build$ ls
CMakeCache.txt  CMakeFiles  cmake_install.cmake  driver  libtensorrt_cpp_api.so  Makefile
vanguard@vanguard-jetson:~/dev/tensorrt-cpp-api/build$ ./driver 
Searching for engine file with name: trt.engine.a220528a4ef634d2ac5172ebc267ecf9.fp32.16.2_4_8.4000000000
Engine not found, generating...
onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
Detected invalid timing cache, setup a local cache instead
Tactic Device request: 2193MB Available: 492MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Tactic Device request: 2193MB Available: 500MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Tactic Device request: 457MB Available: 453MB. Device memory is insufficient to use tactic.
Skipping tactic 4 due to oom error on requested size of 457 detected for tactic 5.
Tactic Device request: 2193MB Available: 482MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Tactic Device request: 2193MB Available: 484MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Success, saved engine to trt.engine.a220528a4ef634d2ac5172ebc267ecf9.fp32.16.2_4_8.4000000000
Success! Average time per inference: 15.2175 ms, for batch size of: 4

After the successful generation of the engine when I run the driver again I get the following error.

Searching for engine file with name: trt.engine.a220528a4ef634d2ac5172ebc267ecf9.fp32.16.2_4_8.4000000000
Engine found, not regenerating...
Segmentation fault (core dumped)

Environment TensorRT Version : 8.0.1-1 CUDA Version : 10.2 Operating System + Version : Ubuntu 18.04.6 LTS Inference Network : AlexNet (using the conversion tutorial here : AlexNet from PyTorch to ONNX ) . I used the dynamic_axes flag while exporting the model.

torch.onnx.export(model, dummy_input, "alexnet_dynamic.onnx", verbose=True, input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

Device for TRT engine builder: Jetson Nano 4GB

cyrusbehr commented 2 years ago

Hi @vanguard478, sorry for the late response. Can you please upload the exported onnx model so I can debug on my end and determine what is causing the segfault.

lucienne999 commented 1 year ago

same problem. I try this official resnet50-onnx and my OS is Ubuntu18.04-cuDNN8.2.2-TensorRT-8.4.1 :

import torchvision.models as models
import torch
import torch.onnx
import numpy as np
import onnx

# load the pretrained model
resnet50 = models.resnet50(pretrained=True, progress=False).eval()

USE_FP16 = True
target_dtype = np.float16 if USE_FP16 else np.float32

dummy_input = torch.randn(1, 3, 224, 224)
torch_out = resnet50(dummy_input)
torch.onnx.export(resnet50, dummy_input, "resnet50_pytorch_d.onnx", 
    verbose=False, 
    input_names=['input'], 
    output_names=['output'], 
    dynamic_axes={
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'},
    }
)
cyrusbehr commented 1 year ago

Hi @LicharYuan The model you have provided has an input of (1, 3, 224, 224), whereas the input image in the repo is of size (3, 112, 112). Therefore the dimensions don't match. I can't be certain if you used the input image in the repo, or your own one. I am therefore closing this ticket.

That being said, I have made the code more robust such that it will give an error message if you try to provide an input which is not the correct size.

Please test again with the patches I have added and if you can still reproduce the bug, you can re-open this ticket and I will look into it.