Closed artem-zinnatullin closed 1 year ago
Interesting, thanks for the detailed info. Would you happen to have the dmesg logs from the system when this issue happens?
Also, I do have some mainline Linux images for the Orange Pi 5 series that might work better for server use, the only thing not working is the GPU, HDMI, and NPU.
https://github.com/Joshua-Riek/ubuntu-rockchip/actions/workflows/build-mainline.yml
I have a Orange Pi 5 Plus, but I am not seeing the issue after 5 days of uptime (had to cut the power to install a heatpump and get off gas). I am using the Orange PI 5+ as a NAS device, and did notice some slow down after a few weeks but more like a 50% slow down (Samba read 100MB/s -> 50MB/s). I have done an "apt update/upgrade" today and will test/report how the speed evolves on my system.
PS. There are some differences between the RTL8125 driver published by Realtek and the one published in the Orange Pi Kernel repo, but it's 2 months ago I looked at this.
After 3 weeks of uptime: no slowdown measured using speedtest-cli --secure
. Also no deterioration in Samba read or write performance (1GB file write: 119MB/s, 1GB file read 120MB/s, switch: TP-link 8x1Gbit).
I did notice though that dmesg is flooded with countless lines of (worth hundreds of MB in size):
[414169.170382] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
[414292.005136] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
[414414.765718] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
[414540.182620] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
[414660.271381] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
[414780.875479] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
[414903.999129] r8125 0004:41:00.0 enP4p65s0: rss get rxnfc
Looking at the source code, it seems related to RSS.
From cat /proc/interrupts
it shows there are 32 irq handlers registered per LAN port. Possiby the RSS function is trying to balance or relocate the irq handlers. The message itself is created by:
netif_info(tp, drv, tp->dev, "rss get rxnfc\n");
in r8125_rss.c
. This line should be deleted or changed to netif_dbg
...
Based on your configuration, I would guess your OS is on the NVME drive(s) but if it would happen to be on an SD card or other medium with slower write speeds, than the massive logging to a slower medium (or a file system full?) could possibly explain the slow down.
Thanks for the detailed observation, I can make a quick kernel patch so the kernel log will not be spammed with this message.
That would be great! As I regular kernel programmer, I would be happy to help and contribute PR's, but I am not familiar enough of where I can find all the bits and pieces of your Linux build (e.g. where you pull the kernel from) to be able to contribute in the form of PR's. A few months ago, I looked into the lastest published Realtek driver code (9.011.01) and found a few issues as well as things that could be improved for better performance (e.g. page reuse)., but have not had the time lately to do proper testing of my changes.
PS. If you need help with some (code) issue or to test something out (on the Orange Pi5+), feel free to reach out.
@ewaldc
where you pull the kernel from
I guess from here: https://github.com/Joshua-Riek/linux-rockchip
Yeah, the thing about the kernel is it's a hacked Android kernel, so there are so many bugs.
Hi folks, I'm dedicating my 2nd orangepi5 to debug this issue.
I've been running fine with network stack restart in cron for about a month:
0 9 * * * /etc/init.d/networking restart
The issue is that I see reports of GPU and NPU not working in recent releases and software I use relies on both GPU and NPU (video processing & object recognition) so it'll be hard to upgrade for me to get this log patch tested: https://github.com/armbian/linux-rockchip/pull/114
But otherwise I'm quiet condident I'll get the issue reproduced within few days due to amount of traffic my OrangePi s handle from 4k & 2k cameras 24/7.
Hi folks, I'm dedicating my 2nd orangepi5 to debug this issue.
I've been running fine with network stack restart in cron for about a month:
0 9 * * * /etc/init.d/networking restart
The issue is that I see reports of GPU and NPU not working in recent releases and software I use relies on both GPU and NPU (video processing & object recognition) so it'll be hard to upgrade for me to get this log patch tested: armbian/linux-rockchip#114
But otherwise I'm quiet condident I'll get the issue reproduced within few days due to amount of traffic my OrangePi s handle from 4k & 2k cameras 24/7.
You dont need to worry about this, GPU and NPU not working are related to mainline Linux (6.6.x) not the Rockchip Linux 5.10.160.
I will be closing this as it can not be reproduced.
Hi @Joshua-Riek, apologies for late addition to the report. The issue still occurs regularly as originally reported on both of my Orange Pi 5 and Orange Pi 5 Plus boards.
I've configured Speedtest.net integrations on HomeAssistant (runs on Orange Pi 5) and it now collects regular data, here is graph where speed drops from my upstream internet limit of ~400Mbit/s
(download) and ~100Mbit/s
(upload) down to 14Mbit/s
download and 14Mbit/s
upload within 3 days of uptime and only reboot helps to bring it back.
I'm on your latest available kernel (apt-get update && apt-get upgrade
regularly):
uname -a
Linux orangepi5n1 5.10.160-rockchip #31 SMP Mon Feb 12 15:49:56 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
What logs should I try to collect when it Ethernet speed drops?
@artem-zinnatullin, I have set up a similar monitor (timed daily 1GB file transfer over Samba to/from Windows) and I am now also seeing the issue. In addition, for my systems, a reboot is now no longer solving the issue! Does reboot restore the performance for you? Last fall, I could only reproduce a 50% drop in performance over ~3 weeks and a reboot brought things back to 100MB/s (limited by my 1Gb switch). It seems things have gotten worse since last fall... For tools, I use 'ethtool -S enP4p65s0' (ethernet card and driver stats) and 'journalctl -b 1' (kernel, syslog etc.), but I could not see anything obvious like packets dropped, CRC errors or collisions. The kernel/sys logs contain plenty of errors/warnings/failures and, as Joshua mentioned, indicate a buggy kernel, but nothing seems to stand out. Except regular updates, the only things I changed is to add WIFI cards. Will take one out to see if that makes a difference.
I'm working on a new 6.1 kernel and I recently backported the r8125 driver to it. Maybe I can also update the driver in 5.10.
Please note the 6.1 kernel is intended to be used with the upcoming Ubuntu 24.04 release in April.
@Joshua-Riek , wonderful. Let me know if I can do anything to help. One thing to add: while I am noticing a ~10x drop in network throughput versus the fall of last year, I am still getting 10 to 11MB/s throughput compared to ~100MB/s before. That is still ~10x better than what @artem-zinnatullin reports. It makes me think there is something more involved than just the r8125 driver.
@Joshua-Riek happy to upgrade to 6.x kernel as soon as it runs with GPU and NPU support and/or provide more logs and data from 5.x 👍 Can we please get this issue reopened since @ewaldc is now consistently observing it too?
@ewaldc reboot every 2-3 days is the only thing that helps restore the performance in my case. It's also interesting that on my graph you can see how performance gradually goes down over time, perhaps some buffer starts accumulating dead objects and network packets stop being buffered thus the throughput drop?
In my case I'm running 24/7 full resolution video feed analysis from 12 4k & 2k cameras on two Orange Pis with Frigate doing object recognition on NPU and ffmpeg
decoding the H.256 and H.264 streams on GPU. Both Orange Pis have NVMe drives, no microSDs. I also use VLANs (hopefully that's not relevant).
The 6.1 kernel is ready to be released in a beta state. But it does require a more recent version of mpp and ffmpeg which I do have working properly. My original intention was to release the 6.1 kernel with Ubuntu 24.04, so any introduced issues by the new kernel would be specific to the new Ubuntu version. But I may release a kernel package for 6.1 so users on Ubuntu 22.04 can upgrade the kernel at their own risk. However, my attention is focused on a few regression issues that are unrelated to the 6.1 kernel.
@Joshua-Riek, IMHO, a possible value of releasing the 6.1 kernel (without any support of course) on 22.04 would be to test it ahead of Ubuntu 24.04. It could also help to provode a baseline for comparison of issues/behaviors between 24.04 and 22.04 since both systems would be running the same kernel (easier to determine if an issue is kernel or OS related).
Yeah, I've been doing just that. Systemd was updated recently and it broke the bootstrapping process of creating new Ubuntu 24.04 images, so I'm stuck on Ubuntu 22.04 for the moment.
I run Ubuntu-Rockchip v1.27 (and all previous versions since March) and what I observe on both my Orange Pi 5 and Orange Pi 5 Plus the Ethernet speeds are dropping down to ~10Mbit/s from full 1 Gbit/s after few days of uptime.
Reboot fixes it for few days and then it happens again.
Before reboot:
After reboot in few minutes:
Orange Pi 5:
Orange Pi 5 Plus:
Other:
Happy to provide more data and do more tests (I checked scp local-local machines before and it was equally bad so it's not internet connection issue but forgot to do that before reboot, can include that in few days).