ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
717 stars 209 forks source link

CUDA error in execute 9 when use this repo as external library #125

Closed mgrova closed 3 years ago

mgrova commented 4 years ago

hi! First of all, thanks for developing this repo!

My problem is the following: I'm using this repo as external library in Jetson NX Xavier. I converted my yolov4-tiny weights to rt file with fp32 precision and, when i'm going to use this abstraction in my code it works for 1-2 seconds and crash causing a segmentation fault. The log debug that appears in command console is:

TENSORRT LOG: ../rtSafe/cuda/caskConvolutionRunner.cpp (490) - Cuda Error in execute: 9 (invalid configuration argument)
TENSORRT LOG: FAILED_EXECUTION: std::exception

And the backtrace of segmentation fault is the following:

#0  0x0000007fb6e8e2ec in tcache_get (tc_idx=0) at malloc.c:2943
#1  0x0000007fb6e8e2ec in __GI___libc_malloc (bytes=16) at malloc.c:3050
#2  0x0000007fb702652c in operator new(unsigned long) () at /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#3  0x0000007f9c90f9e0 in nvinfer1::NVTXAnnotator::annotateBlock() const () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#4  0x0000007f9c8db504 in nvinfer1::rt::ExecutionContext::generateAnnotation(nvtxStringRegistration_st*, nvinfer1::Optional<nvtxStringRegistration_st*>, nvinfer1::NVTX::Color) ()
    at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#5  0x0000007f9c8db810 in nvinfer1::rt::ExecutionContext::generateLayerAnnotation(int) () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#6  0x0000007f9c8df6ec in nvinfer1::rt::ExecutionContext::enqueueInternal(CUevent_st**) () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#7  0x0000007f9c8e1690 in nvinfer1::rt::ExecutionContext::enqueue(int, void**, CUstream_st*, CUevent_st**) () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#8  0x0000007fb5817864 in tk::dnn::NetworkRT::infer(tk::dnn::dataDim_t&, float*) () at /usr/local/lib/libtkDNN.so
#9  0x0000007fb7e0ebcc in tk::dnn::DetectionNN::update(std::vector<cv::Mat, std::allocator<cv::Mat> >&, int, bool, std::basic_ofstream<char, std::char_traits<char> >*, bool) (this=0x5555e831c0, frames=std::vector of length 1, capacity 1 = {...}, cur_batches=1, save_times=false, times=0x0, mAP=false) at /usr/local/include/tkDNN/DetectionNN.h:126

Any idea of what are happening?

Thanks in advance, Marco.

mive93 commented 3 years ago

Hi @mgrova It seems some problems related to nvidia libraries. Which jetpack have you used? Have you changed something? Also try rebooting and check if the problem still occurs.

mive93 commented 3 years ago

Closing for now, feel free to reopen.