Yolov8 model inference problem with GPU ( net.forward() )

quentinblechet commented 1 year ago

Hello,

I'm currently using your code inside a ROS2 node and i'm doing inference with yolov8. It seems that i have a problem with the forward fucntion. When i use CUDA BACKEND the output of forward give me the right size but the data are always 0 where i have normal value when i run it on CPU.

std::vector<cv::Mat> outputs;
net.forward(outputs, net.getUnconnectedOutLayersNames());

int rows = outputs[0].size[1];
int dimensions = outputs[0].size[2];

std::cout << rows << " / " << dimensions << " / " << *(float *)outputs[0].data << std::endl;

CPU GPU

Those are screenshots while running my code with CPU and GPU.

JustasBart commented 1 year ago

Hi @quentinblechet I haven't seen that issue before and I'm not really sure how would you go about fixing it really...

I mean something that stands out as obvious to me would be to check:

Your Nvidia GPU drivers as well as your OpenCV (CUDA, cuDNN) build.
Make sure that ROS has access to those drivers etc...
Make sure that you've exported your ONNX model correctly (Although since it's working on CPU it appears to be just fine).
Try it out with different models such as a small/medium yolov5 model versus a yolov8 model. (Try and isolate that the issue is purely to do with the GPU).
Try asking ChatGPT to see if it can come up with anything in regards to your problem...

In case you've missed it there's the OpenCV build outline:

OpenCV build

All the best and good luck 🚀

quentinblechet commented 1 year ago

Thx for this quick answer.

quentinblechet commented 1 year ago

I have Opencv 4.8.0. Could that be the problem?

JustasBart commented 1 year ago

@quentinblechet In theory it shouldn't matter at all but only assuming that everything is building properly and whatnot. Can you confirm your build output?

quentinblechet commented 1 year ago

General configuration for OpenCV 4.8.0 ===================================== Version control: unknown

Extra modules: Location (extra): /home/quentin.blechet/opencv_contrib/modules Version control (extra): unknown

Platform: Timestamp: 2023-08-10T08:41:16Z Host: Linux 5.15.0-78-generic x86_64 CMake: 3.26.4 CMake generator: Unix Makefiles CMake build tool: /usr/bin/make Configuration: RELEASE

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (18 files): + SSSE3 SSE4_1 SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (37 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: /usr/bin/c++ (ver 9.4.0) C++ flags (Release): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG C++ flags (Debug): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG C Compiler: /usr/bin/cc C flags (Release): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG C flags (Debug): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG Linker flags (Release): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
Linker flags (Debug): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
ccache: NO Precompiled headers: NO Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu 3rdparty dependencies:

OpenCV modules: To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto Disabled: world Disabled by dependency: - Unavailable: cvv java julia matlab ovis python2 sfm viz Applications: tests perf_tests examples apps Documentation: NO Non-free algorithms: YES

GUI: GTK3 GTK+: YES (ver 3.24.20) GThread : YES (ver 2.64.6) GtkGlExt: NO VTK support: NO

Media I/O: ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11) JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 80) WEBP: /usr/lib/x86_64-linux-gnu/libwebp.so (ver encoder: 0x020e) PNG: /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.37) TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.1.0) JPEG 2000: OpenJPEG (ver 2.3.1) OpenEXR: /usr/lib/x86_64-linux-gnu/libImath.so /usr/lib/x86_64-linux-gnu/libIlmImf.so /usr/lib/x86_64-linux-gnu/libIex.so /usr/lib/x86_64-linux-gnu/libHalf.so /usr/lib/x86_64-linux-gnu/libIlmThread.so (ver 2_3) HDR: YES SUNRASTER: YES PXM: YES PFM: YES

Video I/O: DC1394: YES (2.2.5) FFMPEG: YES avcodec: YES (58.54.100) avformat: YES (58.29.100) avutil: YES (56.31.100) swscale: YES (5.5.100) avresample: YES (4.0.0) GStreamer: YES (1.16.3) v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries: Intel IPP: 2021.8 [2021.8.0] at: /home/quentin.blechet/opencv/build/3rdparty/ippicv/ippicv_lnx/icv Intel IPP IW: sources (2021.8.0) at: /home/quentin.blechet/opencv/build/3rdparty/ippicv/ippicv_lnx/iw VA: NO Lapack: NO Eigen: YES (ver 3.3.7) Custom HAL: NO Protobuf: build (3.19.1) Flatbuffers: builtin/3rdparty (23.5.9)

NVIDIA CUDA: YES (ver 11.6, CUFFT CUBLAS FAST_MATH) NVIDIA GPU arch: 75 NVIDIA PTX archs:

cuDNN: YES (ver 8.9.3)

OpenCL: YES (no extra features) Include path: /home/quentin.blechet/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load

Python 3: Interpreter: /usr/bin/python3 (ver 3.8.10) Libraries: /usr/lib/x86_64-linux-gnu/libpython3.8.so (ver 3.8.10) numpy: /home/quentin.blechet/.local/lib/python3.8/site-packages/numpy/core/include (ver 1.24.3) install path: lib/python3.8/site-packages/cv2/python-3.8

Python (for build): /usr/bin/python2.7

Java:
ant: NO Java: NO JNI: NO Java wrappers: NO Java tests: NO

Install to: /usr/local

JustasBart commented 1 year ago

@quentinblechet I'm not seeing anything obvious here... Other than it's strange that it's saying 'Version control: unknown' and 'Version control (extra): unknown'.

Can you confirm that you've used 'git clone ...' opposed to downloading the .zip file?

It's important to use 'git clone ...' and then 'git checkout 4.8.0' because when you run your cmake (I use cmake-gui for convenience) you'll see that it'll automatically start downloading certain packages as it goes.

I'm not saying that this is the case with your build by the way I'm just saying that it's weird that it says unknown.

Other than that yeah I'm not too sure... It's odd that nothing is crashing/failing alright... Just to confirm as well, you do have an Nvidia based GPU, right?

quentinblechet commented 1 year ago

Yes i have used git clone. And yes i have a Nvidia based GPU. I clearly don't see where the problem can came from.

JustasBart commented 1 year ago

@quentinblechet Yeah, my apologies for being unable to provide more for you but I don't know, I haven't seen/dealt with this particular issue and I'm not really sure how to fix it. I mean the only thing that I could say really would be to perhaps re-build but that's a bit of a long-winded solution that may not even solve anything...

Do let me know if you figure this out at some point, good luck! 🚀

quentinblechet commented 1 year ago

Just as an extra information. It works perfectly with yolov5s.

JustasBart commented 1 year ago

@quentinblechet That's rather strange... Have you tried exporting the yolov8 model ONNX yourself? If I remember correctly you should set the 'opset=12'

JustasBart commented 1 year ago

@quentinblechet Otherwise it might be the transpose function somehow in that yolov8 has it's output vectors order swapped compared to v5.

matheusbg8 commented 1 year ago

Hello @quentinblechet and @JustasBart , just wanted to let you know that I am using OpenCV 4.8, and I am facing the same issue. It works on CPU but doesn't work on GPU. I am using your shell script to export to ONNX format, and it has the opset=12 argument.

I am running on Ubuntu 20.04 with an RTX 3070. YOLOv5 works well on both CPU and GPU, but YOLOv8 works on CPU only.

JustasBart commented 1 year ago

Thanks for your input @matheusbg8, it's beginning to sound like maybe something had changed with OpenCV 4.8.0, I'm actually having troubles building it and setting it up fully as I was used to from before as well... I'll check in if I have any updates on it... In the meantime though OpenCV 4.7.0 should in theory work better as that's what I was using before.

diet-teacher commented 1 year ago

i had same issue with OpenCV 4.8.0. so i had changed to 4.7.0 it perfectly works with YOLOv8

JustasBart / yolov8_CPP_Inference_OpenCV_ONNX

Yolov8 model inference problem with GPU ( net.forward() ) #15