Joshua-Riek / ubuntu-rockchip

Ubuntu for Rockchip RK35XX Devices
https://joshua-riek.github.io/ubuntu-rockchip-download/
GNU General Public License v3.0
2.37k stars 257 forks source link

Orange Pi 5+; high power consumption and thermals; load average >= 1 #606

Open bbklopfer opened 9 months ago

bbklopfer commented 9 months ago

Hi,

Booted ubuntu-22.04.3-preinstalled-server-arm64-orangepi-5-plus on my 5 Plus. Noticed a few things (in comparison to the Orange Pi issued Debian image):

I am booting from the SD card, eMMC is connected to system and has the Orange Pi Debian image. I tried running the ubuntu-rockchip kernel with the Orange Pi Debian userspace, and I get the same power/thermal/load average results, so it seems (?) like it's a kernel issue.

System has an NVME SSD installed, no wifi card.

Happy to provide any additional info. Thanks!

JFLim1 commented 9 months ago

Have similar experience with v1.29 and current v1.33 Desktop on Opi5-Plus. The load avg is >1 even at idle for significant time (>20min( not sure it will go down <1 if let the system idle longer.

Installed joshua's kernel 5.10.160-28 or 5.10.160-30 on Archlinux the same high load avg >1 exist. at idle Use vendor's kernel 5.10.110-2 or 5.10.160 the load avg is much lower at idle.

EvilOlaf commented 9 months ago

Kind a funny to see the very same bug over the years across various SoC (not only Rockchip). And yes, I experience this as well. However I did not test against vendor images but against mainline 6.8-rc which seems fine.

Anyway, just a guess. Does the load disappear when the NVMe is removed and booted from SD or eMMC?

JFLim1 commented 9 months ago

Does the load disappear when the NVMe is removed and booted from SD or eMMC?

Boot up from NVMe and SD card same high load avg with Joshua's Kernel. In boot case NVMe and EMMC (256GB but empty) is already installed and not remove when boot with SD Card.

Joshua image is stable and so far very good experience on Opi5-Plus.

EvilOlaf commented 9 months ago

Did some tests myself. I doesn't seem to be related to the used type of storage. Tried NVMe, eMMC and SDcard in all combinations either as rootfs or just plain installed. Load always raises to >=1. Bummer...

bbklopfer commented 9 months ago

@EvilOlaf how was your experience with the 6.8-rc kernels, and where did you grab them from (looks like Armbian offers 6.8-rc1, or did you just roll your own)?

My whole reason for wanting to try a new kernel was due to this issue I was experiencing.

EvilOlaf commented 9 months ago

Was from Armbian.

Joshua-Riek commented 9 months ago

@EvilOlaf how was your experience with the 6.8-rc kernels, and where did you grab them from (looks like Armbian offers 6.8-rc1, or did you just roll your own)?

My whole reason for wanting to try a new kernel was due to this issue I was experiencing.

Mainline Linux 6.8 just got HDMI introduced and not all of the hardware is working, specifically the GPU and VPU. You will not be able to run Jellyfin with hardware acceleration if you plan on going down this road. Even when GPU or VPU support comes into mainline Linux I would expect there to be many issues as this bleeding edge software.

This forces most users to use the crappy 5.10 Android kernel. I likely will not look into the load average issue as the kernel is a mess and it's way too much work on a kernel that will likely be dead in a year from now.

bbklopfer commented 9 months ago

Got it --- thanks for chiming in!

Joshua-Riek commented 9 months ago

I will keep this open but add a wont fix tag as it's a valid issue.

EvilOlaf commented 9 months ago

@Joshua-Riek just curious. Whats your opinion of rkr7.1 (5.10.198 I think?) or 6.1 bsp? Is noticed you played with former just a bit and abandoned it.

Joshua-Riek commented 9 months ago

I think rkr 7.1 is fine and see no breaking changes, I may bump to this kernel in the future for legacy reasons. As for 6.1 I still do not have the release tag for it. I've started to do some work on the 6.1 kernel from an old snapshot i got back in late October, but i really want a release tag before spending a lot of time inito it.

EvilOlaf commented 9 months ago

Gotcha.

nyanmisaka commented 9 months ago

I think rkr 7.1 is fine and see no breaking changes, I may bump to this kernel in the future for legacy reasons. As for 6.1 I still do not have the release tag for it. I've started to do some work on the 6.1 kernel from an old snapshot i got back in late October, but i really want a release tag before spending a lot of time inito it.

As far as I know, JeffyCN's kernel-6.1-2024_01_02 tag is the first release of 6.1 bsp. OrangePi also updated their kernel tree not long ago, which also confirmed this.

kernel-6.1-2024_01_02

https://github.com/orangepi-xunlong/linux-orangepi/tree/orange-pi-6.1-rk35xx

Joshua-Riek commented 9 months ago

I would still like to see a release tag, but this looks good. I will likely create a fork from this point and start to rebase stuff.

Joshua-Riek commented 9 months ago

I dropped WiFi patches, LCD panel patches, and some changes for the Khadas Edge. Because I went through about 200 patches with a ton of merge conflicts, I could have made a few mistakes. But here is the current progress, should be an OK starting point.

https://github.com/Joshua-Riek/linux-rockchip/commits/rockchip-6.1/

nyanmisaka commented 9 months ago

Some non-essential peripherals should have lower priority if they cannot be easily ported to 6.1.

Btw I dropped the r8125 out-of-tree driver. The original one is a bit outdated.

Joshua-Riek commented 8 months ago

Hey @nyanmisaka, do you have gnome wayland working with the 6.1 kernel? I just finished some testing and only X11 would start :thinking:

nyanmisaka commented 8 months ago

Hey @nyanmisaka, do you have gnome wayland working with the 6.1 kernel? I just finished some testing and only X11 would start 🤔

I haven't tried panfork on the 6.1 kernel. But I know that libmali can provide Wayland support for Gnome on Ubuntu 23.10 mantic. AFBA2BA3-B709-4173-8ABE-1DDD0C02D277

EvilOlaf commented 8 months ago

So might be worth going the noble route directly?

nyanmisaka commented 8 months ago

The problem may be whether panfork itself is compatible with the updated panfrost kernel mode driver in 6.1 and the new mali csf firmware, rather than the distro version.

Joshua-Riek commented 8 months ago

I just tested Noble and it seems to use llvmpipe sadly, I'll need to try with your 6.1 fork directly with Armbian mantic. Does glmark2 use hw accel in your OS? Screenshot from 2024-02-08 06-27-41

nyanmisaka commented 8 months ago

glmark2-wayland requires full OpenGL but libmali only provide GLES. glmark2-es2-wayland works. And the desktop is still accelerated by kworker/u17:1-mali_kbase_csf_sync_upd Applications requiring full OpenGL will not be accelerated.

Screenshot from 2024-02-08 19-55-02

https://github.com/tsukumijima/libmali-rockchip/releases/tag/v1.9-1-b5d7972

Joshua-Riek commented 8 months ago

I did test panfork and wayland did not work as mentioned before, then crashed a bit later with the below logs, I've not done much debugging yet:

Feb  7 21:27:53 ubuntu-desktop kernel: [   24.302826] mali fb000000.gpu: Loading Mali firmware 0x1010000
Feb  7 21:27:53 ubuntu-desktop kernel: [   24.305300] mali fb000000.gpu: Mali firmware git_sha: ee476db42870778306fa8d559a605a73f13e455c 
Feb  7 21:27:53 ubuntu-desktop kernel: [   24.737056] mali fb000000.gpu: Invalid CPU access to UMM memory for ctx 1227_0
Feb  7 21:31:20 ubuntu-desktop kernel: [  232.566709] mali fb000000.gpu: Invalid CPU access to UMM memory for ctx 1272_1
Feb  7 21:31:21 ubuntu-desktop kernel: [  234.137685] mali fb000000.gpu: Invalid CPU access to UMM memory for ctx 3244_19
nyanmisaka commented 8 months ago

Apparently this is Mali bifrost in the kernel complaining, and panfork doesn't work well with it. You can try downgrading it from g21p0 to g18p0.

https://github.com/JeffyCN/mirrors/commits/kernel-6.1-2024_01_02/drivers/gpu/arm/bifrost

Joshua-Riek commented 8 months ago

Yeah, the 6.1 kernel does not like panfork very much. I think it may be better to work on backporting panthor.

nyanmisaka commented 8 months ago

We tried it last year on 6.1.25 (snapshot from October) but panthor was still unstable at that time. I think there is no need to waste any more time until panthor and Mesa PR are merged into the mainline.

JFLim1 commented 8 months ago

Just a feedback. The kernel https://repo.bredos.org/rkr6/linux-rockchip-rkr6-5.10.160-6-aarch64.pkg.tar.zst running in Plasma Wayland or Plasma X11 session also have the similar high average load greater than 1 even when idling. So it is not unique to Joshua's kernel-5.10.160.

wyf9661 commented 8 months ago

image I 'm not sure, maybe we can upgrade mali driver mali_csffw.bin instead of downgrading kernel use from g21p0 to g18p0.

wyf9661 commented 8 months ago

https://github.com/JeffyCN/mirrors/blob/libmali/firmware/g610/mali_csffw.bin

Joshua-Riek commented 8 months ago

g21p0 breaks wayland sadly, i did try to revert the commit and use g18p0 on 6.1.

wyf9661 commented 8 months ago

Oh, cause I use kde, which set x11 as default, so I don't notice the wayland poor performance.

Operating System: Arch Linux ARM 
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.12
Kernel Version: 6.1.43-2-rkbsp (64-bit)
Graphics Platform: X11
Processors: 4 × ARM Cortex-A55, 4 × ARM Cortex-A76
Memory: 15.6 GiB of RAM
Graphics Processor: Mali-G610
Product Name: Orange Pi 5 Plus

I try to use g21p0 with kernel 5.10 but it seems they are not compatible with each. So sorry it didn't help.

JFLim1 commented 8 months ago

Oh, cause I use kde, which set x11 as default, so I don't notice the wayland poor performance.

Hi @wyf9661, Joshua had already got Panfork to work on Wayland Session. Hope you be update it on your linux-rkbsp-6.1.43 on for Arch Linux too.

Joshua-Riek commented 8 months ago

I will be using the below branch for my 6.1 progress. I think there may be a mpp issue as chrome is not using the GPU properly.

https://github.com/Joshua-Riek/linux-rockchip/tree/rk-6.1-rkr1

wyf9661 commented 8 months ago

Hi @wyf9661, Joshua had already got Panfork to work on Wayland Session. Hope you be update it on your linux-rkbsp-6.1.43 on for Arch Linux too.

linux-rkbsp-6.1.43 follows the rk kernel upstream with only necessary patches. I don't think wayland works well on kde until kde6 comes out. you can build yourself if needed by using git version that follows from joshua with his effort of commits.

JFLim1 commented 8 months ago

linux-rkbsp-6.1.43 follows the rk kernel upstream with only necessary patches.

Hi @wyf9661, Just checking whether you will restart building/release "linux-rkbsp-joshua-git-6.1" as Joshua is now actively developing bsp-kernel-6.1.

wyf9661 commented 8 months ago

linux-rkbsp-6.1.43 follows the rk kernel upstream with only necessary patches.

Hi @wyf9661, Just checking whether you will restart building/release "linux-rkbsp-joshua-git-6.1" as Joshua is now actively developing bsp-kernel-6.1.

I maintain this pkgbuild and update it, but do not build and release it.

EvilOlaf commented 8 months ago

This is getting pretty off-topic. I suggest to move the conversation about 6.1 into a separate issue or discussion.

cu186 commented 8 months ago

Is the high power consumption due to the kernel's Wi-Fi driver?

EvilOlaf commented 8 months ago

Is the high power consumption due to the kernel's Wi-Fi driver?

I don't know. What do you recommend how to investigate deeper? top certainly doesn't cut it.

EvilOlaf commented 8 months ago

Unfortunately the loadavg issue is still present in 6.1.y kernel.

nyanmisaka commented 8 months ago

Compared with 5.10 bsp, this time the 6.1 bsp is already a relatively clean Android kernel. I suspect a specific device driver or hack is causing this problem. It may require some tracing.

EvilOlaf commented 8 months ago

Doesn't seem to be wifi related. built and installed a custom kernel with all wifi and bt stuff disabled. Also disabled bt and wifi in device tree.

nyanmisaka commented 8 months ago

I suspect this may be related to the desktop environment. I've been using the rk3588 as a headless server (no monitor attached) and the load average is close to 0 when idle.

image

nyanmisaka@nanopct6:~$ uname -a
Linux nanopct6 6.1.43-legacy-rk35xx #45 SMP Sun Jan  7 06:29:26 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
nyanmisaka@nanopct6:~$ neofetch
                                 nyanmisaka@nanopct6
                                 -------------------
      â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ      OS: Armbian (23.11.0-trunk) aarch64
     ███████████████████████     Host: FriendlyElec NanoPC-T6
   ▄▄██                   ██▄▄   Kernel: 6.1.43-legacy-rk35xx
   ▄▄██    ███████████    ██▄▄   Uptime: 7 hours, 9 mins
   ▄▄██   ██         ██   ██▄▄   Packages: 1615 (dpkg), 9 (snap)
   ▄▄██   ██         ██   ██▄▄   Shell: bash 5.2.15
   ▄▄██   ██         ██   ██▄▄   Terminal: /dev/pts/0
   ▄▄██   █████████████   ██▄▄   CPU: (8) @ 1.800GHz
   ▄▄██   ██         ██   ██▄▄   Memory: 906MiB / 15716MiB
   ▄▄██   ██         ██   ██▄▄
   ▄▄██   ██         ██   ██▄▄
   ▄▄██                   ██▄▄
     ███████████████████████
      â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ â–ˆ
EvilOlaf commented 8 months ago

Negative. I tried with preinstalled-server image and the issue persists.

Seems like so far only the OPi5+ is affected. 5 is not and your nanopc-t6 isn't as well.

nyanmisaka commented 8 months ago

Weird. I can't see correlations between the circuit board design and this issue.

EvilOlaf commented 8 months ago

Just a observation: Mainline does not have this issue. 6.7.y and 6.8-rc are fine.

nyanmisaka commented 8 months ago

Can this also be reproduced in the OEM image provided by orangepi? It's best to report it directly to them.

JFLim1 commented 8 months ago

Had tested Orangepi lastest OPIOS-Arch (2024.01) on Opi5-Plus with kernels-linux-rk35xx-legacy-5.10.160-1 (https://mirror.orangepi.dev/archlinux/stable/aarch64/opios-core/linux-rk35xx-legacy-5.10.160-1-aarch64.pkg.tar.zst).

It does not seem to have the high loadavg issue.

nilo85 commented 6 months ago

I just did the same with my fresh Opi5+ and just running the preinstalled server image from sd card, the cpu and the nvm ssd is hot and load is high.

I fail to understand the current status of this? it is a wont-fix? but open? what blocks us from trying to solve the issue?

Joshua-Riek commented 6 months ago

Because this is not causing a major issue such as a system crash, I really do not want to spend a few days digging into the kernel to fix it. I have many other ubuntu related tasks and improvements that are being worked on.