Joshua-Riek / ubuntu-rockchip

Ubuntu for Rockchip RK35XX Devices
https://joshua-riek.github.io/ubuntu-rockchip-download/
GNU General Public License v3.0
2.49k stars 266 forks source link

Hanging issue when using the RK3588 NPU #652

Closed daoan1412 closed 8 months ago

daoan1412 commented 8 months ago

I've coded a simple app for face detection using the NPU, running it with multithreading, and after a while (5-10 minutes), the operating system hangs. You can reproduce the issue by building and running the following repo: https://github.com/daoan1412/rk3588_npu_freeze

Device: Orange Pi 5 Plus OS: ubuntu-rockchip v1.33

stallion25 commented 8 months ago

Is your repo using HDMI IN ?

daoan1412 commented 8 months ago

@stallion25 My project does not use HDMI In.

daoan1412 commented 8 months ago

I also tried with this project and the operating system also crashed after about 5 minutes.

artem-zinnatullin commented 8 months ago

Hm, I run YoloV8 on all 3 NPU cores (via Frigate) on 2 of my OrangePi 5s 24/7 and besides Ethernet slowdown #402 which should be unrelated there are no issues with system hanging

You might want to check how rknn NPU access is configured in Frigate:

https://github.com/blakeblackshear/frigate/blob/55077a0bc9384a7d7d1e5903c767995f2a500f07/frigate/detectors/plugins/rknn.py

daoan1412 commented 8 months ago

@artem-zinnatullin I'm noticing that frigate only handles object detection videos with fps =5 Quite small at the level I'm testing at ~100-120 fps.

artem-zinnatullin commented 8 months ago

@daoan1412 100-120 FPS is definitely higher than what I'm running with Frigate: YoloV8s model 320x320 at 8 fps on 6 concurrent cameras on each OrangePi 5. But Frigate does clever stuff like motion detection, zoning, etc between frames to avoid unnecessary model calls.

My NPU load sits pretty low (I don't have histograms though):

cat /sys/kernel/debug/rknpu/load
NPU load:  Core0: 54%, Core1: 24%, Core2: 24%,

Few notes, please ignore if irrelevant in your case:

Hope this helps!

Joshua-Riek commented 8 months ago

I was able to reproduce the cash, i see some new NPU updates in the SDK that seem related. Primarily there is a bugfix addressing a deadlock issue with spin_lock in the rknpu interrupt handler. Let me add the fix to the kernel and see if it helps.

Joshua-Riek commented 8 months ago

The commit https://github.com/Joshua-Riek/linux-rockchip/commit/494c0a303537c55971421b5552d98eb55e652cf3, fixes this issue. I will update the kernel with this fix on Launchpad tomorrow.

Joshua-Riek commented 8 months ago

@daoan1412 thank you for reporting this, I did further testing and found additional issues on the 6.1 kernel when using the NPU.

For my notes, the three below commits need to be cherry-picked onto the 6.1 kernel for proper NPU usage: https://github.com/JeffyCN/mirrors/commit/9ced5e9ae99ca6ee7c59a030176de58dd54cd679 https://github.com/JeffyCN/mirrors/commit/4a35fccb3576ad7e0768f6aa00d692d1a0b124c4 https://github.com/JeffyCN/mirrors/commit/d7be109f40e88f91868a5635b239f2a2c1d6ba47

daoan1412 commented 8 months ago

@artem-zinnatullin Thank you very much for your detailed and helpful insights! I will apply your suggestions to my project to improve performance and stability. @Joshua-Riek It seems to have fixed my problem. I will close this issue. Thank you very much.