High execution time on jetson nano B01 4GB

programmeddeath1 commented 1 year ago

I have a jetson nano B01. It has an internal emmc of 16gb. Due to shortage of space i had moved the os libraries to external usb using these steps (https://www.forecr.io/blogs/bsp-development/change-root-file-system-to-sd-card-directly)

I then installed the ncnn and codeblocks as per this repo and ran yolo on the sample images.

It is taking more than 2 seconds to execute the detection for each image.

[0 NVIDIA Tegra X1 (nvgpu)]  queueC=0[16]  queueG=0[16]  queueT=0[16]
[0 NVIDIA Tegra X1 (nvgpu)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 NVIDIA Tegra X1 (nvgpu)]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 NVIDIA Tegra X1 (nvgpu)]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
Time of execution is 2479.000000Segmentation fault (core dumped)

it finally fails with segmentation fault. Could this be an arch issue with the latest nanos and libraries used.

Any help would be appreciated!

[EDIT] I even upgraded opencv from 4.1.1 to 4.5.1. The result is the same. nanostats

After upgrading to opencv 4.5.1, i tried to reinstall ncnn again, the cmake installation failed with

  The compiler does not support armv8.2 fp16.  NCNN_ARM82 will be OFF.
-- Target arch: arm 64bit
CMake Error at /usr/local/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_INCLUDE_DIRS)
  (found suitable exact version "10.2")
Call Stack (most recent call first):
  /usr/local/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)

Qengineering commented 1 year ago

@programmeddeath1 ,

It's very hard to diagnose the issue, due to the lack of debug info generated. Changing the root partition should be of any effect. Once loaded, it will not interfere with the app (I suppose). Changing OvenCV version doesn't do the job either. Both are only used when showing the output image.

What I have noticed is that the very latest version ncnn used CMake 3.18 (use for the Vulkan libs). The Nano is shipped with another previous version. I could not compile ncnn before I had upgraded CMake. Upgradng CMake is not easy. You have to do it from scratch, and once installed it conflicts with the previous version (the cause is a link in OpenCV - have to sort it out). Given all above, I didn't mentioned it on the website. For now, I would install a previous version of ncnn. Preferably one with the same release date as this repo. Then you are sure we used the same framework. If the issue still remains, something else is the matter. In that case I would building the whole OS step by step to see where it goes wrong. Sorry.

programmeddeath1 commented 1 year ago

Hi Thank you for your reply!

I realized it could be a cmake issue. So i upgraded cmake to 3.25,3.26-rc1, 3.20. I tried with all these cmake versions from the official build - https://cmake.org/files/ using following steps

tar -zxvf cmake-3.19.5-Linux-aarch64.tar.gz 
cd cmake-3.19.5-Linux-aarch64/
sudo cp -rf bin/ doc/ share/ /usr/local/
sudo cp -rf man/* /usr/local/man
sync
cmake --version

Cmake 3.18 and below do not have aarch64 versions. So should this version be built from source?

I will first install opencv again with the new cmake version. then test ncnn build.

Can you tell me which version of cmake and ncnn you built this with so i can test the same framework?

programmeddeath1 commented 1 year ago

The sdkmanager instals default opencv 4.1.1 without CUDA. I installed 4.5.4 with Cuda after which i started getting the cuda not fund error while building ncnn with cmake.

Could this be the reason? If i install opencv without CUDA will ncnn still be able to run at the benchmark speeds you have shown?

Qengineering commented 1 year ago

@programmeddeath1 ,

You have three options. 1) Use a default JetPack 4.6, default OpenCV (no CUDA), an OLD version of ncnn. You get the benchmark speeds and YoloV7 without any hassle. 2) Use a default JetPack 4.6, default OpenCV (no CUDA), the latest version of ncnn. Before installing ncnn, you have to upgrade CMake to version 3.18.4. To only way is by using the source. Once CMake works fine, install ncnn. Again, you get the benchmark speeds and a working YoloV7. 3) OpenCV upgrade after 2). You have CMake 3.18.4 and ncnn running. The benchmark is ok. As is YoloV7. Now you can install OpenCV from source. If you swap the sequence, CMake will complain of a missing CUDA.

programmeddeath1 commented 1 year ago

1) I tried the first one, the cmake version error doesnt allow ncnn build because of cmake <3.15. I tried older revisions of ncnn upto 2020, before that it works with cmake 3.10, but there are more issues related to other functions. 2) I installed new default from sdkmanager. I upgraded cmake to version 3.18.4 by building from source and fixing the cmake ROOT path issues. I was able to build ncnn and run the model. It was still executing in around 2.8 seconds. I am not getting the benchmark speed.

nvidia@ubuntu:~/Projects/ncnnapp$ ./bin/Debug/YoloV7 parking.jpg
[0 NVIDIA Tegra X1 (nvgpu)]  queueC=0[16]  queueG=0[16]  queueT=0[16]
[0 NVIDIA Tegra X1 (nvgpu)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 NVIDIA Tegra X1 (nvgpu)]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 NVIDIA Tegra X1 (nvgpu)]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1

Time for execution is 2.471000
nvidia@ubuntu:~/Projects/ncnnapp$ cmake --version
cmake version 3.18.4

Anything else i can check which might give more information?

I can upgrade opencv and check but i believe that might not be the issue.

Qengineering / YoloV7-ncnn-Jetson-Nano

High execution time on jetson nano B01 4GB #3