DVFS OPP not matching downstream RK settings

ThomasKaiser commented 1 year ago

Is there any specific reason why the cpufreq OPP differ? Am wondering both about the higher voltages for the lower A55 OPP and the lower voltages for the highest OPP on the A76 clusters.

EDIT: to illustrate the differences downstream settings on the left and @sre's on the right:

A55:

   408 MHz    675.0 mV             408 MHz    750.0 mV
   600 MHz    675.0 mV             600 MHz    750.0 mV
   816 MHz    675.0 mV             816 MHz    750.0 mV
  1008 MHz    675.0 mV            1008 MHz    750.0 mV
  1200 MHz    712.5 mV            1200 MHz    775.0 mV
  1416 MHz    762.5 mV            1416 MHz    825.0 mV
  1608 MHz    850.0 mV            1608 MHz    875.0 mV
  1800 MHz    950.0 mV            1800 MHz    950.0 mV

A76:

   408 MHz    675.0 mV             408 MHz    600.0 mV
   600 MHz    675.0 mV             600 MHz    600.0 mV
   816 MHz    675.0 mV             816 MHz    600.0 mV
  1008 MHz    675.0 mV            1008 MHz    625.0 mV
  1200 MHz    675.0 mV            1200 MHz    650.0 mV
  1416 MHz    725.0 mV            1416 MHz    675.0 mV
  1608 MHz    762.5 mV            1608 MHz    700.0 mV
  1800 MHz    850.0 mV            1800 MHz    775.0 mV
  2016 MHz    925.0 mV            2016 MHz    850.0 mV
  2208 MHz    987.5 mV            2208 MHz    925.0 mV

ThomasKaiser commented 1 year ago

Nevermind, I found https://github.com/Googulator/linux-rk3588-midstream/commit/fca463a05e3b46a841b005396d10cda78919584b in the meantime and that's stuff I need to maybe ask @sre after some testing.

ThomasKaiser commented 1 year ago

Reopening the issue to move discussion away from Arch Linux thread...

I'm testing with an Armbian image combining this kernel with legacy RK bootloaders. The following behaviour occurs.

With the DVFS OPP tables as presented in this kernel's DT we see the A55 cores being 'overclocked' and the A76 limited to 2200 MHz (measured with @wtarreau's cool mhz utility):

cpu0-cpu3 (Cortex-A55): OPP: 1800, Measured: 1934      (+7.4%)
cpu4-cpu5 (Cortex-A76): OPP: 2208, Measured: 2200 
cpu6-cpu7 (Cortex-A76): OPP: 2208, Measured: 2200

(link to full sbc-bench output).

Just by q&d tweaking the highest OPP of each cluster with dtc we now get completely different real clockspeeds:

cpu0-cpu3 (Cortex-A55): OPP: 1800, Measured: 1800
cpu4-cpu5 (Cortex-A76): OPP: 2208, Measured: 2335     (+5.8%)
cpu6-cpu7 (Cortex-A76): OPP: 2208, Measured: 2335     (+5.8%)

(link to full sbc-bench output).

All I did was decreasing voltage of the A55 1800 MHz OPP from 950 mV to 875 mV and increasing the A76's 2208 MHz OPP from 925 mV to 1000 mV (we could go even further and with my RK3588 silicon variant I could reach maybe 2800 MHz). And these different real clockspeeds are also reflected in benchmark scores. Running 7-zip's internal benchmark single-threaded on A55/A76 gives 1680/2990 7-zip MIPS with @sre's DT and 1580/3100 with my settings keeping the A55 at 1800 MHz and allowing the A76 to be clocked at 2335 by overvolting them.

But all other OPP are affected as well, for example due to the higher voltages for the lower A55 OPP they're all 'overclocked' except the low ones and 1800 that I 'fixed' now by reducing the supply voltage:

Cpufreq OPP: 1800    Measured: 1800 (1800.174/1799.978/1799.861)
Cpufreq OPP: 1608    Measured: 1800 (1800.057/1800.017/1799.978)    (+11.9%)
Cpufreq OPP: 1416    Measured: 1698 (1698.155/1698.085/1698.085)    (+19.9%)
Cpufreq OPP: 1200    Measured: 1484 (1485.007/1484.874/1484.707)    (+23.7%)
Cpufreq OPP: 1008    Measured: 1231 (1231.632/1231.460/1231.288)    (+22.1%)
Cpufreq OPP:  816    Measured:  979    (979.309/979.264/979.264)    (+20.0%)
Cpufreq OPP:  600    Measured:  590    (590.296/590.244/590.206)     (-1.7%)
Cpufreq OPP:  408    Measured:  392    (392.473/392.402/392.198)     (-3.9%)

That's what I've done to the OPP tables (only adjusting highest OPP of each table). Before:

root@rock-5b:/boot/dtb/rockchip# . sbc-bench.sh ; ParseOPPTables 

##########################################################################

   vdd_cpu_big0_s0: 675 mV (1050 mV max)
   vdd_cpu_big1_s0: 675 mV (1050 mV max)

   opp-table-cluster0:
       408 MHz    750.0 mV
       600 MHz    750.0 mV
       816 MHz    750.0 mV
      1008 MHz    750.0 mV
      1200 MHz    775.0 mV
      1416 MHz    825.0 mV
      1608 MHz    875.0 mV
      1800 MHz    950.0 mV

   opp-table-cluster1:
       408 MHz    600.0 mV
       600 MHz    600.0 mV
       816 MHz    600.0 mV
      1008 MHz    625.0 mV
      1200 MHz    650.0 mV
      1416 MHz    675.0 mV
      1608 MHz    700.0 mV
      1800 MHz    775.0 mV
      2016 MHz    850.0 mV
      2208 MHz    925.0 mV

   opp-table-cluster2:
       408 MHz    600.0 mV
       600 MHz    600.0 mV
       816 MHz    600.0 mV
      1008 MHz    625.0 mV
      1200 MHz    650.0 mV
      1416 MHz    675.0 mV
      1608 MHz    700.0 mV
      1800 MHz    775.0 mV
      2016 MHz    850.0 mV
      2208 MHz    925.0 mV

And after:

root@rock-5b:/home/tk# . sbc-bench.sh ; ParseOPPTables 

##########################################################################

   vdd_cpu_big0_s0: 675 mV (1050 mV max)
   vdd_cpu_big1_s0: 675 mV (1050 mV max)

   opp-table-cluster0:
       408 MHz    750.0 mV
       600 MHz    750.0 mV
       816 MHz    750.0 mV
      1008 MHz    750.0 mV
      1200 MHz    775.0 mV
      1416 MHz    825.0 mV
      1608 MHz    875.0 mV
      1800 MHz    875.0 mV

   opp-table-cluster1:
       408 MHz    600.0 mV
       600 MHz    600.0 mV
       816 MHz    600.0 mV
      1008 MHz    625.0 mV
      1200 MHz    650.0 mV
      1416 MHz    675.0 mV
      1608 MHz    700.0 mV
      1800 MHz    775.0 mV
      2016 MHz    850.0 mV
      2208 MHz   1000.0 mV

   opp-table-cluster2:
       408 MHz    600.0 mV
       600 MHz    600.0 mV
       816 MHz    600.0 mV
      1008 MHz    625.0 mV
      1200 MHz    650.0 mV
      1416 MHz    675.0 mV
      1608 MHz    700.0 mV
      1800 MHz    775.0 mV
      2016 MHz    850.0 mV
      2208 MHz   1000.0 mV

For me this just looks like 'PVTM at work' and an MCU inside RK3588 deciding at which clockspeed the cores really get clocked still based on silicon quality, temperature and the respective supply voltage at each OPP.

The thermal settings also need some love since throttling kicks in at 75°C and jumps from 2208 MHz directly to 1008 MHz without any intermediate steps and then we also have the 'problem' of massively higher idle consumption. But both is something for later :)

Is this reproducible on your side? Running sbc-bench is really all that's needed since doing all this measurement/monitoring stuff and printing OPP tables automatically.

Googulator commented 1 year ago

Consulting the DTS, it seems CPU core clocks are controlled via SCMI, which is a service provided by the secure world (ATF). So it's probably ATF which is adjusting the final CPU clock based on the voltage provided by the RK806.

ThomasKaiser commented 1 year ago

Do we have ATF sources for RK3588 now? Still not, right?

Asides that: can you reproduce this? Since as long as PVTM works as designed and works everywhere the same since we (have to) rely on RK's boot BLOBs the OPP tables need to be adopted to this.

As already written: most simple way is letting run sbc-bench on your hardware since the tool does all the data collection / measuring.

ThomasKaiser commented 1 year ago

The RK3588 on my 5B is a rather 'good' silicon variant according to PVTM (BSP kernel prints this – unfortunately all of this info gets lost with mainline kernel ATM):

cpu cpu0: pvtm=1529
cpu cpu0: pvtm-volt-sel=5
cpu cpu4: pvtm=1781
cpu cpu4: pvtm-volt-sel=7
cpu cpu6: pvtm=1780
cpu cpu6: pvtm-volt-sel=7

As such I defined DVFS OPP tables with supply voltages that fit the respective pvtm-volt-sel value. Choosing the opp-microvolt-L5 settings for the A55 and opp-microvolt-L7 for the 1st A76 cluster (example). The 2nd A76 cluster got the 'default' opp-microvolt values from BSP kernel in Radxa's flavour:

opp-table-cluster0:
    408 MHz    675.0 mV
    600 MHz    675.0 mV
    816 MHz    675.0 mV
   1008 MHz    675.0 mV
   1200 MHz    675.0 mV
   1416 MHz    712.5 mV
   1608 MHz    800.0 mV
   1800 MHz    887.5 mV

opp-table-cluster1:
    408 MHz    675.0 mV
    600 MHz    675.0 mV
    816 MHz    675.0 mV
   1008 MHz    675.0 mV
   1200 MHz    675.0 mV
   1416 MHz    675.0 mV
   1608 MHz    700.0 mV
   1800 MHz    762.5 mV
   2016 MHz    837.5 mV
   2208 MHz    912.5 mV
   2256 MHz   1000.0 mV
   2304 MHz   1000.0 mV
   2352 MHz   1000.0 mV
   2400 MHz   1000.0 mV

opp-table-cluster2:
    408 MHz    675.0 mV
    600 MHz    675.0 mV
    816 MHz    675.0 mV
   1008 MHz    675.0 mV
   1200 MHz    675.0 mV
   1416 MHz    725.0 mV
   1608 MHz    762.5 mV
   1800 MHz    850.0 mV
   2016 MHz    925.0 mV
   2208 MHz    987.5 mV
   2256 MHz   1000.0 mV
   2304 MHz   1000.0 mV
   2352 MHz   1000.0 mV
   2400 MHz   1000.0 mV

With these settings with cluster0 and cluster1 I get similar results with BSP kernel and this kernel:

cpu0-cpu3 (Cortex-A55) with BSP:

Cpufreq OPP: 1800    Measured: 1823 (1823.209/1823.168/1823.128)     (+1.3%)
Cpufreq OPP: 1608    Measured: 1642 (1643.074/1642.992/1642.829)     (+2.1%)
Cpufreq OPP: 1416    Measured: 1424 (1424.042/1424.011/1424.011)
Cpufreq OPP: 1200    Measured: 1234 (1234.535/1234.507/1234.219)     (+2.8%)
Cpufreq OPP: 1008    Measured: 1065 (1065.818/1065.796/1065.560)     (+5.7%)
Cpufreq OPP:  816    Measured:  848    (848.245/848.143/848.058)     (+3.9%)
Cpufreq OPP:  600    Measured:  591    (591.378/591.353/591.327)     (-1.5%)
Cpufreq OPP:  408    Measured:  393    (393.560/393.542/393.337)     (-3.7%)

cpu0-cpu3 (Cortex-A55) with this kernel (and my voltage settings that fit PVTM classification):

Cpufreq OPP: 1800    Measured: 1817 (1817.875/1817.755/1817.435)
Cpufreq OPP: 1608    Measured: 1639 (1639.448/1639.448/1639.244)     (+1.9%)
Cpufreq OPP: 1416    Measured: 1422 (1422.418/1422.388/1422.296)
Cpufreq OPP: 1200    Measured: 1233 (1233.643/1233.643/1233.643)     (+2.8%)
Cpufreq OPP: 1008    Measured: 1064 (1065.024/1064.852/1064.745)     (+5.6%)
Cpufreq OPP:  816    Measured:  847    (847.362/847.294/847.175)     (+3.8%)
Cpufreq OPP:  600    Measured:  590    (590.270/590.257/590.244)     (-1.7%)
Cpufreq OPP:  408    Measured:  392    (392.446/392.446/392.384)     (-3.9%)

cpu4-cpu5 (Cortex-A76) with BSP:

Cpufreq OPP: 2400    Measured: 2333 (2333.266/2333.213/2333.213)     (-2.8%)
Cpufreq OPP: 2352    Measured: 2333 (2333.160/2333.107/2333.055)
Cpufreq OPP: 2304    Measured: 2332 (2333.055/2333.002/2332.739)     (+1.2%)
Cpufreq OPP: 2256    Measured: 2332 (2333.002/2333.002/2332.949)     (+3.4%)
Cpufreq OPP: 2208    Measured: 2173 (2173.107/2173.062/2172.924)     (-1.6%)
Cpufreq OPP: 2016    Measured: 2007 (2007.296/2007.296/2007.101)
Cpufreq OPP: 1800    Measured: 1810 (1810.824/1810.666/1810.666)
Cpufreq OPP: 1608    Measured: 1622 (1622.828/1622.788/1622.708)
Cpufreq OPP: 1416    Measured: 1435 (1435.981/1435.919/1435.887)     (+1.3%)
Cpufreq OPP: 1200    Measured: 1258 (1258.445/1258.445/1258.385)     (+4.8%)
Cpufreq OPP: 1008    Measured: 1056 (1056.188/1055.977/1055.924)     (+4.8%)
Cpufreq OPP:  816    Measured:  849    (849.727/849.727/849.642)     (+4.0%)
Cpufreq OPP:  600    Measured:  592    (592.906/592.893/592.880)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.932/394.914/394.914)     (-3.4%)

cpu4-cpu5 (Cortex-A76) with this kernel (and again my voltage settings that fit PVTM classification):

Cpufreq OPP: 2400    Measured: 2326 (2326.277/2326.120/2326.068)     (-3.1%)
Cpufreq OPP: 2352    Measured: 2325 (2326.120/2325.858/2325.858)     (-1.1%)
Cpufreq OPP: 2304    Measured: 2325 (2326.015/2325.754/2325.754)
Cpufreq OPP: 2256    Measured: 2325 (2325.701/2325.597/2325.597)     (+3.1%)
Cpufreq OPP: 2208    Measured: 2167 (2167.135/2167.135/2167.135)     (-1.9%)
Cpufreq OPP: 2016    Measured: 2002 (2002.675/2002.675/2002.675)
Cpufreq OPP: 1800    Measured: 1808 (1808.368/1808.288/1808.288)
Cpufreq OPP: 1608    Measured: 1622 (1622.071/1622.071/1621.992)
Cpufreq OPP: 1416    Measured: 1435 (1435.763/1435.544/1435.451)     (+1.3%)
Cpufreq OPP: 1200    Measured: 1258 (1258.326/1258.266/1258.176)     (+4.8%)
Cpufreq OPP: 1008    Measured: 1055 (1055.661/1055.635/1055.608)     (+4.7%)
Cpufreq OPP:  816    Measured:  849    (849.454/849.420/849.403)     (+4.0%)
Cpufreq OPP:  600    Measured:  592    (592.491/592.491/592.452)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.555/394.555/394.519)     (-3.4%)

On the 2nd A76 cluster where I used the higher opp-microvolt values from the BSP kernel and not the opp-microvolt-L7 where available of course it looks differently with those OPP where this matters (1416-2208):

Cpufreq OPP: 2400    Measured: 2326 (2326.330/2326.277/2326.120)     (-3.1%)
Cpufreq OPP: 2352    Measured: 2326 (2326.225/2326.225/2326.173)     (-1.1%)
Cpufreq OPP: 2304    Measured: 2326 (2326.173/2326.068/2325.963)
Cpufreq OPP: 2256    Measured: 2326 (2326.068/2326.068/2325.911)     (+3.1%)
Cpufreq OPP: 2208    Measured: 2305 (2305.305/2305.150/2305.099)     (+4.4%)
Cpufreq OPP: 2016    Measured: 2192 (2192.758/2192.618/2192.618)     (+8.7%)
Cpufreq OPP: 1800    Measured: 2031 (2031.674/2031.574/2031.524)    (+12.8%)
Cpufreq OPP: 1608    Measured: 1806 (1806.984/1806.826/1806.747)    (+12.3%)
Cpufreq OPP: 1416    Measured: 1578 (1578.729/1578.540/1578.540)    (+11.4%)
Cpufreq OPP: 1200    Measured: 1246 (1246.581/1246.464/1246.346)     (+3.8%)
Cpufreq OPP: 1008    Measured: 1048 (1049.117/1048.961/1048.909)     (+4.0%)
Cpufreq OPP:  816    Measured:  843    (843.350/843.308/843.266)     (+3.3%)
Cpufreq OPP:  600    Measured:  592    (592.530/592.478/592.478)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.555/394.546/394.537)     (-3.4%)

150balbes commented 1 year ago

Is this reproducible on your side? Running sbc-bench is really all that's needed since doing all this measurement/monitoring stuff and printing OPP tables automatically.

Did I understand correctly that just need to run the test and publish the result? Are there any additional steps to replace DTB or something else needed between tests?

ThomasKaiser commented 1 year ago

Did I understand correctly that just need to run the test and publish the result?

Yes, please simply run sbc-bench or sbc-bench -r and provide link to results (takes 20-25 minutes on RK3588). If time allows it would be great to exchange /boot/dtb/rockchip/rk3588-rock-5b.dtb with contents from https://transfer.sh/bt4FdB/rk3588-rock-5b-BSP-OPP-tables.dts

cp -p /boot/dtb/rockchip/rk3588-rock-5b.dtb /boot/dtb/rockchip/rk3588-rock-5b.dtb.bak
dtc -I dts -O dtb -o /boot/dtb/rockchip/rk3588-rock-5b.dtb /path/to/rk3588-rock-5b-BSP-OPP-tables.dts

Then rebooting and retesting. This will tell us even with the 2400 MHz OPP in place how other RK3588 silicon variants than mine behave based on the assumption that the PVTM code running in an MCU does its thing.

And then it would be interesting which silicon variant your RK3588 represents. But unfortunately this requires booting with BSP kernel since only then stuff like pvtm=1529 and cpu cpu0: pvtm-volt-sel=5 can be read out simply by dmesg.

150balbes commented 1 year ago

And then it would be interesting which silicon variant your RK3588 represents. But unfortunately this requires booting with BSP kernel since only then stuff like pvtm=1529 and cpu cpu0: pvtm-volt-sel=5 can be read out simply by dmesg.

I currently have my u-boot version installed in MTD\SPI (with the correct USB\SD\NVMe startup order and support for direct startup from USB) and the system is used on NVMe with the kernel 5.10.110. Therefore, there is no problem to show the result.

root@rock-5b:~# dmesg | grep pvtm [ 10.834822] rockchip-pvtm fda40000.pvtm: pvtm@0 probed [ 10.834890] rockchip-pvtm fda50000.pvtm: pvtm@1 probed [ 10.834953] rockchip-pvtm fda60000.pvtm: pvtm@2 probed [ 10.835019] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed [ 10.835079] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed [ 12.025735] cpu cpu0: pvtm=1508 [ 12.027971] cpu cpu0: pvtm-volt-sel=5 [ 12.049590] cpu cpu4: pvtm=1758 [ 12.054772] cpu cpu4: pvtm-volt-sel=6 [ 12.070304] cpu cpu6: pvtm=1751 [ 12.076289] cpu cpu6: pvtm-volt-sel=6 [ 12.401457] mali fb000000.gpu: pvtm=878 [ 12.415199] mali fb000000.gpu: pvtm-volt-sel=3 [ 12.946991] RKNPU fdab0000.npu: pvtm=876 [ 12.954929] RKNPU fdab0000.npu: pvtm-volt-sel=3

Question. How critical is which version of DDR blobs is used in u-boot (SPI)? Now in the SPI version .

https://github.com/150balbes/build/blob/armbian-tv/config/sources/families/include/rockchip64_common.inc#L106

I can build a new version with other women.

ThomasKaiser commented 1 year ago

Question. How critical is which version of DDR blobs is used in u-boot (SPI)?

No idea but I would believe the more important part is the bl31 BLOB since most probably the MCU firmware is included there. So your RK3588 is also one of the 'better' silicon variants (5/6/6). As such most probably your results will not differ that much from mine but it's still interesting! :)

150balbes commented 1 year ago

As a starting point for comparison - the result of running on a system with BSP kernel 5.10.110.

http://ix.io/4qSS

oficial image armbian-minimal kernel midstream

http://ix.io/4qTk

(as I test different options, I will update this post)

150balbes commented 1 year ago

I probably make a mistake, but I tried to run tests after replacing DTB and all the results are only on the monitor. There is no link about uploading the result to the Internet. Perhaps need an additional key \ option for forced upload to the Internet ?

ThomasKaiser commented 1 year ago

There is no link about uploading the result to the Internet

This happens sometimes with ix.io. The contents of your last run should still be at the end of /var/log/sbc-bench.log so please simply paste it from there with whatever service you like :)

150balbes commented 1 year ago

oficial image armbian-minimal kernel midstream + new dtb

http://ix.io/4qWF

I have build several test versions of armbiantv from different branches of the kernel (so that it would be easy to run from usb the same system on different models, including station m3 , khadas edge2), additionally changed the kernel config. these are the results with the default DTB from the midstream branch.

http://ix.io/4qWP

midstream + new dtb

http://ix.io/4qWY

ThomasKaiser commented 1 year ago

Yep, the results confirm that real clockspeeds as long as Rockchip's BL31 is active more depend on supply voltage and temperature than on actual OPP definitions: (see here with RK3566 and UEFI for example)

With the modified OPP table (the one from the BSP kernel) the A55 are 'back at normal', 1st A76 cluster as well while 2nd A76 cluster shows higher clockspeeds with the OPP that are affected by differing voltages (1416-2208 MHz OPP):

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

Cpufreq OPP: 1800    Measured: 1812 (1812.254/1812.174/1812.055)
Cpufreq OPP: 1608    Measured: 1626 (1627.101/1626.981/1626.861)     (+1.1%)
Cpufreq OPP: 1416    Measured: 1402 (1402.117/1402.057/1401.879)
Cpufreq OPP: 1200    Measured: 1210 (1210.824/1210.685/1210.602)
Cpufreq OPP: 1008    Measured: 1043 (1043.863/1043.786/1043.735)     (+3.5%)
Cpufreq OPP:  816    Measured:  833    (833.446/833.323/833.015)     (+2.1%)
Cpufreq OPP:  600    Measured:  589    (589.281/589.268/589.076)     (-1.8%)
Cpufreq OPP:  408    Measured:  391    (391.533/391.233/391.224)     (-4.2%)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

Cpufreq OPP: 2400    Measured: 2323 (2323.243/2323.191/2323.139)     (-3.2%)
Cpufreq OPP: 2352    Measured: 2323 (2323.243/2323.139/2323.034)     (-1.2%)
Cpufreq OPP: 2304    Measured: 2323 (2323.295/2322.982/2322.877)
Cpufreq OPP: 2256    Measured: 2323 (2323.243/2323.139/2322.982)     (+3.0%)
Cpufreq OPP: 2208    Measured: 2159 (2160.021/2159.840/2159.795)     (-2.2%)
Cpufreq OPP: 2016    Measured: 1989 (1989.657/1989.609/1989.466)     (-1.3%)
Cpufreq OPP: 1800    Measured: 1788 (1788.990/1788.990/1788.913)
Cpufreq OPP: 1608    Measured: 1596 (1596.340/1596.262/1596.224)
Cpufreq OPP: 1416    Measured: 1407 (1407.220/1407.130/1407.130)
Cpufreq OPP: 1200    Measured: 1228 (1228.571/1228.571/1228.485)     (+2.3%)
Cpufreq OPP: 1008    Measured: 1032 (1032.755/1032.629/1032.553)     (+2.4%)
Cpufreq OPP:  816    Measured:  829    (829.118/829.037/828.935)     (+1.6%)
Cpufreq OPP:  600    Measured:  592    (592.478/592.478/592.439)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.591/394.582/394.528)     (-3.4%)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

Cpufreq OPP: 2400    Measured: 2318 (2318.394/2318.238/2318.134)     (-3.4%)
Cpufreq OPP: 2352    Measured: 2318 (2318.394/2318.134/2317.978)     (-1.4%)
Cpufreq OPP: 2304    Measured: 2317 (2318.030/2317.978/2317.978)
Cpufreq OPP: 2256    Measured: 2317 (2317.822/2317.770/2317.614)     (+2.7%)
Cpufreq OPP: 2208    Measured: 2296 (2296.030/2296.030/2295.979)     (+4.0%)
Cpufreq OPP: 2016    Measured: 2179 (2179.940/2179.848/2179.848)     (+8.1%)
Cpufreq OPP: 1800    Measured: 2014 (2014.145/2014.047/2014.047)    (+11.9%)
Cpufreq OPP: 1608    Measured: 1782 (1783.008/1782.931/1782.777)    (+10.8%)
Cpufreq OPP: 1416    Measured: 1553 (1553.613/1553.613/1553.540)     (+9.7%)
Cpufreq OPP: 1200    Measured: 1226 (1226.264/1226.207/1226.178)     (+2.2%)
Cpufreq OPP: 1008    Measured: 1029 (1029.689/1029.639/1029.589)     (+2.1%)
Cpufreq OPP:  816    Measured:  825    (825.962/825.881/825.881)     (+1.1%)
Cpufreq OPP:  600    Measured:  592    (592.517/592.478/592.465)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.537/394.528/394.519)     (-3.4%)

Since your RK3588 is also a 'strong' silicon variant and results differ not that much from mine it would be great if you could give both settings another try on a different device. So far from all RK3588/RK3588s devices Khadas Edge2 always 'scored' lowest with PVTM values so most probably your Edge2 is the ideal candidate.

Here is an Edge2 with these PVTM properties where BSP kernel + BL31 even result in the highest available OPP being downclocked to ~400 MHz with light to medium loads! :)

cpu cpu0: pvtm=1425
cpu cpu0: pvtm-volt-sel=1
cpu cpu4: pvtm=1649
cpu cpu4: pvtm-volt-sel=3
cpu cpu6: pvtm=1665
cpu cpu6: pvtm-volt-sel=3

A final test would be to really overvolt the CPU cores like @amazingfate did it already half a year ago to get the A76 up to 2700 MHz: https://forum.radxa.com/t/rock-5b-debug-party-invitation/10483/498?u=tkaiser

Focus of the test is only confirming whether 'disabling' the highest DVFS OPP as Sebastian did has any real effect as long as the MCU/firmware inside RK3588 clocks the cores based on supply voltages anyway ignoring the clockspeed values defined in DT.

But as soon as I edit the highest OPP's voltage to exceed 1000 mV I get the set clk failed messages spamming dmesg output and currently too busy with other stuff to find the correct regulator nodes to change this to 1500 mV.

n2qcn commented 1 year ago

[    0.000000] Linux version 6.2.0-rc1-station-m6 (root@vbox) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #trunk SMP PREEMPT_DYNAMIC Fri Mar 17 16:22:40 MSK 2023
[...]
[    0.148002] hw perfevents: enabled with armv8_cortex_a55 PMU driver, 7 counters available
[    0.148410] hw perfevents: enabled with armv8_cortex_a76 PMU driver, 7 counters available
[...]
[    1.808494] cpu cpu0: EM: OPP:816000 is inefficient
[    1.808498] cpu cpu0: EM: OPP:600000 is inefficient
[    1.808500] cpu cpu0: EM: OPP:408000 is inefficient
[    1.808580] cpu cpu0: EM: created perf domain
[    1.809181] cpu cpu4: EM: OPP:600000 is inefficient
[    1.809184] cpu cpu4: EM: OPP:408000 is inefficient
[    1.809278] cpu cpu4: EM: created perf domain
[    1.809880] cpu cpu6: EM: OPP:600000 is inefficient
[    1.809883] cpu cpu6: EM: OPP:408000 is inefficient
[    1.810175] cpu cpu6: EM: created perf domain

Full [sbc-bench] results uploaded to http://ix.io/4ro7 [and local cut below]

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

Cpufreq OPP: 1800    Measured: 1825 (1825.383/1825.303/1825.262)     (+1.4%)
Cpufreq OPP: 1608    Measured: 1695 (1695.437/1695.263/1695.055)     (+5.4%)
Cpufreq OPP: 1416    Measured: 1596 (1596.879/1596.648/1596.609)    (+12.7%)
Cpufreq OPP: 1200    Measured: 1390 (1390.790/1390.790/1390.673)    (+15.8%)
Cpufreq OPP: 1008    Measured: 1152 (1152.188/1152.188/1152.138)    (+14.3%)
Cpufreq OPP:  816    Measured:  921    (921.049/921.029/920.989)    (+12.9%)
Cpufreq OPP:  600    Measured:  591    (591.094/591.043/590.940)     (-1.5%)
Cpufreq OPP:  408    Measured:  393    (393.176/393.158/392.971)     (-3.7%)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

Cpufreq OPP: 2208    Measured: 2068 (2068.459/2068.459/2068.294)     (-6.3%)
Cpufreq OPP: 2016    Measured: 1909 (1909.527/1909.482/1909.350)     (-5.3%)
Cpufreq OPP: 1800    Measured: 1721 (1721.394/1721.250/1721.143)     (-4.4%)
Cpufreq OPP: 1608    Measured: 1504 (1504.163/1504.061/1504.027)     (-6.5%)
Cpufreq OPP: 1416    Measured: 1324 (1324.399/1324.319/1324.293)     (-6.5%)
Cpufreq OPP: 1200    Measured: 1156 (1156.749/1156.673/1156.547)     (-3.7%)
Cpufreq OPP: 1008    Measured:  974    (974.213/974.101/973.698)     (-3.4%)
Cpufreq OPP:  816    Measured:  784    (784.846/784.791/784.773)     (-3.9%)
Cpufreq OPP:  600    Measured:  592    (592.880/592.867/592.854)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.932/394.896/394.887)     (-3.4%)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

Cpufreq OPP: 2208    Measured: 2063 (2063.378/2063.172/2063.172)     (-6.6%)
Cpufreq OPP: 2016    Measured: 1903 (1903.983/1903.940/1903.896)     (-5.6%)
Cpufreq OPP: 1800    Measured: 1716 (1716.353/1716.353/1716.317)     (-4.7%)
Cpufreq OPP: 1608    Measured: 1499 (1499.354/1499.320/1499.218)     (-6.8%)
Cpufreq OPP: 1416    Measured: 1321 (1321.850/1321.817/1321.652)     (-6.7%)
Cpufreq OPP: 1200    Measured: 1155 (1155.460/1155.460/1155.334)     (-3.7%)
Cpufreq OPP: 1008    Measured:  972    (972.690/972.556/972.400)     (-3.6%)
Cpufreq OPP:  816    Measured:  781    (781.746/781.728/781.728)     (-4.3%)
Cpufreq OPP:  600    Measured:  592    (592.893/592.880/592.854)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.905/394.905/394.887)     (-3.4%)

[...]

   vdd_cpu_big0_s0: 925 mV (1050 mV max)
   vdd_cpu_big1_s0: 925 mV (1050 mV max)

   gpu-opp-table:
       300 MHz    675.0 mV
       400 MHz    675.0 mV
       500 MHz    675.0 mV
       600 MHz    675.0 mV
       700 MHz    700.0 mV
       800 MHz    750.0 mV
       900 MHz    800.0 mV
      1000 MHz    850.0 mV

   opp-table-cluster0:
       408 MHz    750.0 mV
       600 MHz    750.0 mV
       816 MHz    750.0 mV
      1008 MHz    750.0 mV
      1200 MHz    775.0 mV
      1416 MHz    825.0 mV
      1608 MHz    875.0 mV
      1800 MHz    950.0 mV

   opp-table-cluster1:
       408 MHz    600.0 mV
       600 MHz    600.0 mV
       816 MHz    600.0 mV
      1008 MHz    625.0 mV
      1200 MHz    650.0 mV
      1416 MHz    675.0 mV
      1608 MHz    700.0 mV
      1800 MHz    775.0 mV
      2016 MHz    850.0 mV
      2208 MHz    925.0 mV

   opp-table-cluster2:
       408 MHz    600.0 mV
       600 MHz    600.0 mV
       816 MHz    600.0 mV
      1008 MHz    625.0 mV
      1200 MHz    650.0 mV
      1416 MHz    675.0 mV
      1608 MHz    700.0 mV
      1800 MHz    775.0 mV
      2016 MHz    850.0 mV
      2208 MHz    925.0 mV

150balbes commented 1 year ago

So your RK3588 is also one of the 'better' silicon variants (5/6/6). As such most probably your results will not differ that much from mine but it's still interesting! :)

$ dmesg | grep pvtm [ 4.064598] rockchip-pvtm fda40000.pvtm: pvtm@0 probed [ 4.064657] rockchip-pvtm fda50000.pvtm: pvtm@1 probed [ 4.064717] rockchip-pvtm fda60000.pvtm: pvtm@2 probed [ 4.064770] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed [ 4.064821] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed [ 4.360207] cpu cpu0: pvtm=1499 [ 4.360292] cpu cpu0: pvtm-volt-sel=4 [ 4.374085] cpu cpu4: pvtm=1734 [ 4.382508] cpu cpu4: pvtm-volt-sel=5 [ 4.397347] cpu cpu6: pvtm=1745 [ 4.405565] cpu cpu6: pvtm-volt-sel=6 [ 4.487909] mali fb000000.gpu: pvtm=885 [ 4.487947] mali fb000000.gpu: pvtm-volt-sel=3 [ 4.568549] RKNPU fdab0000.npu: pvtm=887 [ 4.572941] RKNPU fdab0000.npu: pvtm-volt-sel=4

https://forum.armbian.com/topic/24931-armbian-efigrub-nvme/?do=findComment&comment=162303

150balbes commented 1 year ago

Full [sbc-bench] results uploaded to http://ix.io/4ro7 [and local cut below]

can you run any version of the system with kernel 5.10 and show the output ?

dmesg | grep pvtm

n2qcn commented 1 year ago

can you run any version of the system with kernel 5.10 and show the output ?

Full results uploaded to http://ix.io/4rrM

[   10.549502] rockchip-pvtm fda40000.pvtm: pvtm@0 probed
[   10.549576] rockchip-pvtm fda50000.pvtm: pvtm@1 probed
[   10.549642] rockchip-pvtm fda60000.pvtm: pvtm@2 probed
[   10.549710] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed
[   10.549771] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed
[   11.763243] cpu cpu0: pvtm=1433
[   11.765599] cpu cpu0: pvtm-volt-sel=1
[   11.781119] cpu cpu4: pvtm=1658
[   11.786414] cpu cpu4: pvtm-volt-sel=3
[   11.802487] cpu cpu6: pvtm=1655
[   11.808631] cpu cpu6: pvtm-volt-sel=3
[   12.182675] mali fb000000.gpu: pvtm=842
[   12.211441] mali fb000000.gpu: pvtm-volt-sel=2
[   12.816621] RKNPU fdab0000.npu: pvtm=837
[   12.822504] RKNPU fdab0000.npu: pvtm-volt-sel=2
root@rock-5b:/home/rob# uname -a
Linux rock-5b 5.10.110-media #23.02.2 SMP PREEMPT Fri Feb 17 22:52:47 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

ThomasKaiser commented 1 year ago

Thank you guys. The results with Sebastian's OPP tables and our three boards as tables (Rob's RK3588 1st, then Oleg's, then mine):

cpu0 (A55):

OPP	Measured	Difference	PVTM	Temp
1800	1825	+1.4%	1	42.5°C
1800	1923	+6.8%	4-5	30.5°C
1800	1934	+7.4%	5	39.8°C

cpu4 (A76):

OPP	Measured	Difference	PVTM	Temp
2208	2068	-6.3%	3	42.5°C
2208	2185	-1.2%	5-6	30.5°C
2208	2200	-0.1%	7	39.8°C

cpu6 (A76):

OPP	Measured	Difference	PVTM	Temp
2208	2063	-6.6%	3	42.5°C
2208	2180	-1.3%	6	30.5°C
2208	2200	-0.1%	7	39.8°C

So we see: PVTM at work. The MCU inside the SoC and its firmware (most probably loaded to the MCU by BL31 at each boot?) do the job regardless of what we define in OPP tables. The only stuff that really matters is:

the silicon quality (pvtm and pvtm-volt-sel values)
ambient / SoC temperature
the supply voltages for each DVFS OPP

As long as RK3588 is running with a BSP toolchain and the MCU inside the SoC is in charge of PVTM it doesn't make a lot of sense to artificially limit highest cpufreq OPP to 2208 MHz since the MCU is responsible for the real clockspeeds and decides based on the three factors we found above.

ThomasKaiser commented 1 year ago

@n2qcn a final test whether the above assumptions are true (and the OPP tables needing adjustments) would be if you could activate the 2400 MHz OPP with 6.2 kernel as illustrated above by exchanging rk3588-rock-5b.dtb with the version provided by me.

I'm pretty confident we see the A76 on your RK3588 still maxing out at 2300 MHz just as they do with the BSP kernel.

n2qcn commented 1 year ago

https://transfer.sh isn't hosting rk3588-rock-5b-BSP-OPP-tables.dts currently.

ThomasKaiser commented 1 year ago

New link that should survive 7 days: https://pastebin.com/raw/WCpC1UUt

150balbes commented 1 year ago

Built a version for Station M3 (rk3588s). I tried to run the test, but it seems there is no control over the CPU frequency. Maybe I made a mistake in DTS for M3, so the control does not work. Or maybe for rk3588s not everything for this is included in the kernel yet ? I'll try again with Khadash EDGE 2 and Opi 5 (I'm finishing adding support for them now).

150balbes commented 1 year ago

firefly station m3

root@station-m3:~# dmesg | grep pvtm [ 9.141696] rockchip-pvtm fda40000.pvtm: pvtm@0 probed [ 9.141782] rockchip-pvtm fda50000.pvtm: pvtm@1 probed [ 9.141908] rockchip-pvtm fda60000.pvtm: pvtm@2 probed [ 9.141981] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed [ 9.142047] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed [ 10.280300] cpu cpu0: pvtm=1473 [ 10.282190] cpu cpu0: pvtm-volt-sel=3 [ 10.296270] cpu cpu4: pvtm=1715 [ 10.301968] cpu cpu4: pvtm-volt-sel=5 [ 10.316473] cpu cpu6: pvtm=1707 [ 10.321437] cpu cpu6: pvtm-volt-sel=4 [ 10.696824] mali fb000000.gpu: pvtm=864 [ 10.700133] mali fb000000.gpu: pvtm-volt-sel=3 [ 11.231653] RKNPU fdab0000.npu: pvtm=869 [ 11.235504] RKNPU fdab0000.npu: pvtm-volt-sel=3

khadas edge2

root@khadas-edge2:~# dmesg | grep pvtm [ 7.975455] rockchip-pvtm fda40000.pvtm: pvtm@0 probed [ 7.975535] rockchip-pvtm fda50000.pvtm: pvtm@1 probed [ 7.975604] rockchip-pvtm fda60000.pvtm: pvtm@2 probed [ 7.975669] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed [ 7.975729] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed [ 8.994438] cpu cpu0: pvtm=1434 [ 8.996641] cpu cpu0: pvtm-volt-sel=1 [ 9.011744] cpu cpu4: pvtm=1671 [ 9.018483] cpu cpu4: pvtm-volt-sel=3 [ 9.034191] cpu cpu6: pvtm=1668 [ 9.039389] cpu cpu6: pvtm-volt-sel=3 [ 9.541975] mali fb000000.gpu: pvtm=868 [ 9.553414] mali fb000000.gpu: pvtm-volt-sel=3 [ 10.148969] RKNPU fdab0000.npu: pvtm=864 [ 10.158169] RKNPU fdab0000.npu: pvtm-volt-sel=3

150balbes commented 1 year ago

final test whether the above assumptions are true (and the OPP tables needing adjustments) would be if you could activate the 2400 MHz OPP with 6.2 kernel as illustrated above by exchanging rk3588-rock-5b.dtb with the version provided by me.

These are the DTS sources for M3 and edge2, maybe make a version based on them with your OPP changes (to immediately include them in the build)?

https://github.com/150balbes/rockchip-kernel/blob/test/arch/arm64/boot/dts/rockchip/rk3588s-roc-pc.dts

https://github.com/150balbes/rockchip-kernel/blob/test/arch/arm64/boot/dts/rockchip/rk3588s-khadas-edge2.dts

The frequency control works on EDG2. Here is the test result with the default DTB.

http://ix.io/4rwp

ThomasKaiser commented 1 year ago

Or maybe for rk3588s not everything for this is included in the kernel yet ?

Rockchip handles RK3588 as a superset of RK3588s (rk3588.dtsi contains #include "rk3588s.dtsi" and only adds nodes) as such I would believe both SoCs are the same on the driver layer.

I submitted a pull request restoring 'original' (Rockchip's) DVFS OPP tables to your repo right now so by rebuilding images these should be restored on devices as well.

Since the PVTM stuff hasn't landed at mainline kernel right now the major difference is all RK3588/RK3588s will use supply voltages for 'bad' silicon variants which results in the following (at least I hope so):

the A55 'overclocking' should stop, with the 'original' settings the A55 should max out around 1800 MHz and not north of 1900 MHz as with Sebastian's settings
the A76 on 'good' silicon variants will clock higher, also produce more heat and are more prone to throttling with bad heat dissipation
in general all RK3588/RK3588s will clock higher compared to Sebastian's settings due to supply voltage of highest DVFS OPP being lifted from 925 mV to 1000 mV.

If the PR doesn't break anything and you're done rebuilding images a test with Khadas Edge 2 (low silicon quality in general) would be great! And then @n2qcn also doesn't need to fiddle around with dtc since Oleg's RK3588s is of same low quality than the RK3588 on Rob's board.

spikerguy commented 1 year ago

@150balbes I have been working on rk3588s-roc-pc dts for mainline kernel.
https://gitlab.manjaro.org/manjaro-arm/packages/core/linux-rk3588/-/commit/0425068d15f8bfe7da14caee79ccb2184c72b7bd

If you want to give it a try. You can use googalutors source with this patch.

I have also used neggles source as my upstream and rebased it to 6.1.20

@Googulator Maybe we can all work together in bringing up device support for the devices we have.

I see you have done an amazing job in port downstream source to latest kernel.

n2qcn commented 1 year ago

There are warnings compiling rk3588-rock-5b-BSP-OPP-tables.dts into rk3588-rock-5b.dtb so your right, I'm missing something.

# ls -l /boot/dtb/rockchip/rk3588-rock-5b*
-rwxr-xr-x 1 root root 100382 Mar 22 21:04 /boot/dtb/rockchip/rk3588-rock-5b.dtb
-rwxr-xr-x 1 root root 255695 Feb 17 17:52 /boot/dtb/rockchip/rk3588-rock-5b.dtb.bak
-rwxr-xr-x 1 root root 252052 Feb 17 17:52 /boot/dtb/rockchip/rk3588-rock-5b-v11.dtb

Booting...

Scanning nvme 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
=================begin===================
313 bytes read in 3 ms (101.6 KiB/s)
1:  Armbian
Retrieving file: /boot/uInitrd
=================begin===================
20930316 bytes read in 26 ms (767.7 MiB/s)
Retrieving file: /boot/Image
=================begin===================
33747456 bytes read in 38 ms (846.9 MiB/s)
!!! env helper try: /boot/uEnv.txt
Retrieving file: /boot/uEnv.txt
** File not found /boot/uEnv.txt **
append: root=UUID=8c5cfce5-c78b-4cc2-86db-72aa1b3b12bf console=ttyS02,1500000 console=tty0 rw no_console_suspend consoleblank=0 fsck.fix=yes fsck.repair=yes net.ifnames=0 splash plymouth.ignore-serial-consoles
Retrieving file: /boot/dtb/rockchip/rk3588-rock-5b.dtb
=================begin===================
100382 bytes read in 5 ms (19.1 MiB/s)
Fdt Ramdisk skip relocation
No misc partition
## Loading init Ramdisk from Legacy Image at 0a200000 ...
   Image Name:   uInitrd
   Image Type:   AArch64 Linux RAMDisk Image (gzip compressed)
   Data Size:    20930252 Bytes = 20 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 0x0a100000
   Booting using the fdt blob at 0x0a100000
  'reserved-memory' ramoops@110000: addr=110000 size=f0000
   Using Device Tree in place at 000000000a100000, end 000000000a11b81d
can't found rockchip,drm-logo, use rockchip,fb-logo
WARNING: could not set reg FDT_ERR_BADOFFSET.
failed to reserve fb-loader-logo memory
Adding bank: 0x00200000 - 0x08400000 (size: 0x08200000)
Adding bank: 0x09400000 - 0xf0000000 (size: 0xe6c00000)
Adding bank: 0x100000000 - 0x200000000 (size: 0x100000000)
Total: 8061.586 ms

Starting kernel ...

and hangs forever... good boot looks like:

Scanning nvme 0:1...
Found /boot/extlinux/extlinux.conf
nRetrieving file: /boot/extlinux/extlinux.conf
=================begin===================
313 bytes read in 2 ms (152.3 KiB/s)
1:  Armbian
Retrieving file: /boot/uInitrd
=================begin===================
20930316 bytes read in 25 ms (798.4 MiB/s)
Retrieving file: /boot/Image
=================begin===================
33747456 bytes read in 36 ms (894 MiB/s)
!!! env helper try: /boot/uEnv.txt
Retrieving file: /boot/uEnv.txt
** File not found /boot/uEnv.txt **
append: root=UUID=8c5cfce5-c78b-4cc2-86db-72aa1b3b12bf console=ttyS02,1500000 console=tty0 rw no_console_suspend consoleblank=0 fsck.fix=yes fsck.repair=yes net.ifnames=0 splash plymouth.ignore-serial-consoles
Retrieving file: /boot/dtb/rockchip/rk3588-rock-5b.dtb
=================begin===================
255695 bytes read in 6 ms (40.6 MiB/s)
Fdt Ramdisk skip relocation
No misc partition
## Loading init Ramdisk from Legacy Image at 0a200000 ...
   Image Name:   uInitrd
   Image Type:   AArch64 Linux RAMDisk Image (gzip compressed)
   Data Size:    20930252 Bytes = 20 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 0x0a100000
   Booting using the fdt blob at 0x0a100000
  'reserved-memory' cma: addr=10000000 size=10000000
  'reserved-memory' ramoops@110000: addr=110000 size=f0000
   Using Device Tree in place at 000000000a100000, end 000000000a1416ce
Adding bank: 0x00200000 - 0x08400000 (size: 0x08200000)
Adding bank: 0x09400000 - 0xf0000000 (size: 0xe6c00000)
Adding bank: 0x100000000 - 0x200000000 (size: 0x100000000)
Total: 8081.457 ms

Starting kernel ...

I/TC: Secondary CPU 1 initializing
I/TC: Secondary CPU 1 switching to normal world boot
I/TC: Secondary CPU 2 initializing
I/TC: Secondary CPU 2 switching to normal world boot
I/TC: Secondary CPU 3 initializing
I/TC: Secondary CPU 3 switching to normal world boot
I/TC: Secondary CPU 4 initializing
I/TC: Secondary CPU 4 switching to normal world boot
I/TC: Secondary CPU 5 initializing
I/TC: Secondary CPU 5 switching to normal world boot
I/TC: Secondary CPU 6 initializing
I/TC: Secondary CPU 6 switching to normal world boot
I/TC: Secondary CPU 7 initializing
I/TC: Secondary CPU 7 switching to normal world boot
[    8.197077] Booting Linux on physical CPU 0x0000000000 [0x412fd050]
[    8.197105] Linux version 5.10.110-media (root@7c62ec528045) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #23.02.2 SMP PREEMPT Fri Feb 17 22:52:47 UTC 2023
[    8.209001] Machine model: Radxa ROCK 5B
[    8.217984] efi: UEFI not found.

150balbes commented 1 year ago

I submitted a pull request restoring 'original' (Rockchip's) DVFS OPP tables to your repo right now so by rebuilding images these should be restored on devices as well.

Tnx. Merge edge 2 test with new dtb

http://ix.io/4rAw

If the PR doesn't break anything and you're done rebuilding images a test with Khadas Edge 2 (low silicon quality in general) would be great! And then @n2qcn also doesn't need to fiddle around with dtc since Oleg's RK3588s is of same low quality than the RK3588 on Rob's board.

Now the assembly of images for Khadash EDGE 2 with the new DTB variants is coming to an end and I will upload the images to the site (the results with it on my instance are higher than with the previous one).

If you want to give it a try. You can use googalutors source with this patch.

I have already added primary support for rk3588s-roc-pc (station M3) and uploaded working images to the site. But CPU frequency adjustment is not working there yet. I'll look at your changes, maybe there will be something useful. By the way, funny behavior - station m3, frequency adjustment does not work, but HW acceleration works well for pancsf (x11 and wayland for ubuntu and debian), on khadas edge 2 - frequency adjustment works, but HW acceleration does not work. Now I need to check (add) primary support for OPi5 and see what happens there. :)

Booting...
Scanning nvme 0:1..

Are you running a test system with NVMe ? Which u-boot are you using? what is in SPI\MTD? You can install a new u-boot in SPI\MTD and without disturbing the NVMe system, immediately launch the entire system from SD or USB media. This is much more convenient for quick tests, the working system on NVMe is not affected.

ThomasKaiser commented 1 year ago

edge 2 test with new dtb http://ix.io/4rAw

Thank you for (re)testing. So we're still seeing PVTM at work preventing the A76 being clocked at 2.4 GHz on 'weak' silicon variants of the SoC. Let's compare to another Edge 2 with same 'weak' RK3588S (pvtm-volt-sel=1/3/3) tested months ago with 5.10.66 BSP kernel: http://ix.io/4h6D

5.10 BSP:

1800    Measured: 1783 (1783.777/1783.662/1783.585)
2352    Measured: 2182 (2182.842/2182.842/2182.796)     (-7.2%)
2352    Measured: 2201 (2201.308/2201.121/2201.027)     (-6.4%)

6.2 with 'adopted' OPP tables from BSP:

1800    Measured: 1858 (1858.887/1858.845/1858.803)     (+3.2%)
2400    Measured: 2254 (2254.736/2254.343/2254.146)     (-6.1%)
2400    Measured: 2253 (2253.900/2253.802/2253.802)     (-6.1%)

We see higher clockspeeds for the simple reason we're ignoring PVTM at the moment. Once a PVTM driver lands in mainline the pvtm-volt-sel information will be used and on a pvtm-volt-sel=1 A55 the supply voltage at the 1800 MHz OPP won't be 950 mV but 937.5 mV instead.

Same with the A76 and the higher DVFS OPP but here the BSP kernel went a different route and chose 1000 mV for all OPP exceeding 2208 MHz but enabled only some of those OPP if the silicon variant is marked as being capable of: https://github.com/radxa/kernel/blob/cd6b6c05bb037712ebfc806255c8c98ad5bc8806/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L920

Radxa changed this last year so all Rock 5B now list all cpufreq OPP up to 2400 MHz but that's just cosmetics since MCU/PVTM still prevent lower quality silicon revisions from reaching 2400 MHz, only the difference between advertised and real clockspeeds got bigger. Before it was e.g. 2256 MHz (cpufreq driver / sysfs) vs. 2230 MHz (measured), after it was 2400 vs. 2230 MHz.

Maybe with mainline kernel we need to come up with another solution for the highest A76 OPP and need to develop proper mV values for the different OPP?

@150balbes can you please give export MODE=extensive ; sbc-bench.sh a try on your Edge2? This will also run cpuminer and the stockfish benchmark as sort of stressors / reliability testing. An additional stress-ng --matrix 0 -t 300 doesn't hurt too :)

150balbes commented 1 year ago

can you please give export MODE=extensive ; sbc-bench.sh a try on your Edge2?

http://ix.io/4rB4

150balbes commented 1 year ago

An additional stress-ng --matrix 0 -t 300 doesn't hurt too

user@khadas-edge2:~$ stress-ng stress-ng: info: [3785] stress-ng: info: [3785] stress-ng: metrc: [3785] stressor stress-ng: metrc: [3785] stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: metrc: [3785] matrix stress-ng: info: [3785] --matrix 0 -t 300 -M setting to a 300 second (5 mins, 0.00 secs) run per stressor dispatching hogs: 8 matrix bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB) 5169492 300.00 2396.56 0.07 17231.64 2156.98 99.86 2200 miscellaneous metrics: 81562.41 add matrix ops per sec (geometic mean of 8 instances) 183601.74 copy matrix ops per sec (geometic mean of 8 instances) 44789.59 div matrix ops per sec (geometic mean of 8 instances) 33822.02 frobenius matrix ops per sec (geometic mean of 8 instances) 81324.46 hadamard matrix ops per sec (geometic mean of 8 instances) 71337.03 identity matrix ops per sec (geometic mean of 8 instances) 64559.87 mean matrix ops per sec (geometic mean of 8 instances) 105088.22 mult matrix ops per sec (geometic mean of 8 instances) 104188.20 negate matrix ops per sec (geometic mean of 8 instances) 235.81 prod matrix ops per sec (geometic mean of 8 instances) 82393.97 sub matrix ops per sec (geometic mean of 8 instances) 240.52 square matrix ops per sec (geometic mean of 8 instances) 34262.22 trans matrix ops per sec (geometic mean of 8 instances) 388100.19 zero matrix ops per sec (geometic mean of 8 instances) successful run completed in 300.01s (5 mins, 0.01 secs)

150balbes commented 1 year ago

fix dtb for station m3

http://ix.io/4rBu

ThomasKaiser commented 1 year ago

https://disk.yandex.ru/d/hCEsF9uEVSN-6g

Thank you, I pasted it over to http://ix.io/4rBu

The RK3588S on your M3 is a pvtm-volt-sel=3/5/4 chip and results are more or less as expected (I would have expected a few MHz more with the A76 cores compared to Edge 2 but temperatures also play a role):

1800    Measured: 1865 (1865.601/1865.559/1865.138)     (+3.6%)
2400    Measured: 2258 (2259.026/2258.977/2258.779)     (-5.9%)
2400    Measured: 2254 (2255.426/2254.786/2254.687)     (-6.1%)

150balbes commented 1 year ago

I pasted it over to http://ix.io/4rBu

Tnx

Opi5 is next in line, now I'm building an image for it, if everything works correctly (there will be no errors from my crooked hands), there will be more results from Opi5. :)

150balbes commented 1 year ago

for the sake of curiosity, I ran a test on M3 with the mainline kernel 6.3 (branch rk3588)

http://ix.io/4rBO

150balbes commented 1 year ago

Opi5

dmesg | grep pvtm [ 7.861489] rockchip-pvtm fda40000.pvtm: pvtm@0 probed [ 7.861546] rockchip-pvtm fda50000.pvtm: pvtm@1 probed [ 7.861601] rockchip-pvtm fda60000.pvtm: pvtm@2 probed [ 7.861660] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed [ 7.861710] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed [ 9.110773] cpu cpu0: pvtm=1440 [ 9.111001] cpu cpu0: pvtm-volt-sel=2 [ 9.120666] cpu cpu4: pvtm=1666 [ 9.124869] cpu cpu4: pvtm-volt-sel=3 [ 9.135213] cpu cpu6: pvtm=1677 [ 9.139419] cpu cpu6: pvtm-volt-sel=4 [ 9.317826] mali fb000000.gpu: pvtm=841 [ 9.317895] mali fb000000.gpu: pvtm-volt-sel=2 [ 9.370607] RKNPU fdab0000.npu: pvtm=841 [ 9.376969] RKNPU fdab0000.npu: pvtm-volt-sel=2

Opi5

http://ix.io/4rGu

ThomasKaiser commented 1 year ago

Sebastian got back to me in the meantime. :)

Most probably we're dealing here with a non-issue since he tried to get the HW working in the first place having some trouble with the 2400 MHz OPP and then decided to drop the highest OPP to get things working on his HW (RK's EVB1).

None of his cpufreq work has been sent to the respective mailing lists yet as such @Googulator has to be blamed for cherry picking stuff (just kidding –– you've done an awesome job and those of us who care about DVFS and such stuff are to blame :) ).

"Basically don't trust my values" (quoting Sebastian) means in my understanding:

instead of relying on Sebastian's supply voltages adopt DVFS settings from latest RK BSP (and these settings might change, see e.g. this)
once cpufreq support for RK3588 in mainline gets into focus define supply voltage levels for the higher DVFS OPP if mainline kernel does not handle things as RK's 5.10 (denying high cpufreq OPPs based on PVTM while feeding them all with same 1000 mV voltage)

ThomasKaiser commented 1 year ago

BTW: After comparing what's written on many RK35xx SoCs I believe the 'silicon quality' is already printed onto the SoC. It's about the 2nd line here:

Rockchip-RK3588M

From left to right only looking at the 1st and last 4 characters of the 2nd line:

NACX... 2223
SACX... 2152
SAFX... 2207 (RK3588M)
SADX... 2214
SAEX... 2152 (RK3588 on my Rock-5b)

The numbers on the right are production week (year followed by week of year date +%y%V).

And I believe char 3 is for silicon quality: F being highest, then E (my pvtm-volt-sel=5/7/7 RK3588) being of lower quality, D even lower down to C (I found nothing going lower than C except of this RK3588s image showing an A variant shared by @cnxsoft).

Pure speculation by now so may I ask you whether you can read and share this info from those RK3588(s) where PVTM values are known?

n2qcn commented 1 year ago

My example

ThomasKaiser commented 1 year ago

@n2qcn thank you! So my assumption is already busted since G doesn't go well along with pvtm-volt-sel=1/3/3 :)

wtarreau commented 1 year ago

More likeky it's just the assembly line in the factory. Different lines will have different quality due to machine tuning and mask alignment.

Googulator commented 1 year ago

my_rk3588

This is mine. Almost the same markings as n2qcn's (5810 vs 5820 is the only difference), but PVTM=5/6/6. So it's probably not correlated.

n2qcn commented 1 year ago

The Rock5B is a lot of fun, but its hard to match an Intel N100 for $185USD including case & power supply. | AZW MINI S / N100 | 3400 MHz | 6.1 | Ubuntu 22.04.2 LTS x86_64 | 14010 | 4020 | 1224220 | 9900 | 8900 | - | http://ix.io/4sQP

Googulator commented 1 year ago

Turns out it's actually PVTPLL, not PVTM, which is responsible for big cores not reaching 2.4GHz with stock voltages - and it can be disabled. Simply change all cores in the DTS from <&scmi_clk SCMI_CLK_CPUx> to <&cru ARMCLK_x> (where x is the cluster ID), and CRU will directly drive the CPU cores at the configured clock. With some tweaking to the CRU driver, it's then possible to overclock in increments of 12MHz.

However, be prepared for a nasty surprise: 2.4GHz is unstable at the default voltage of 1.0V (on my 5/6/6 silicon), behaving like a bad overclock under load. I believe the underlying problem is with the RK860x voltage regulators used to power the big cores, which are only rated up to 6 amps each. 6 amps @ 1.0V correspond to 6 watts (because of Ohm's law), and with each RK860x powering 2 big cores, that leaves 3 watts per core.

In a different SoC, the A76 was benchmarked as drawing ~2.5W on average under load, surging to 3.6W under certain workloads. That SoC (the HiSilicon Kirin 980) is a 7nm design running the A76s @ 2.6GHz - our implementation is 8nm, which is slightly less efficient, so hitting the 3W limit @ 2,4GHz is easily possible.

Reading the documentation for PVTPLL, it is now clear that the clocking behavior is not driven by an MCU, but rather simply by the physics of using a ring oscillator to generate the clock. The PVTPLL ring oscillators are built from the same types of transistors that are used in the cores themselves, so any voltage sag will also cause the clock to slow down by the same ratio as signal propagation within the core, and so with a well-chosen ring length, the PVTPLL will always generate a "safe" clock frequency, regardless of the voltage delivered to the chip.

In theory, it's possible to switch each cluster, or maybe even each core individually, between PVTPLL and CRU clocking, however, I found that setting one big cluster to PVTPLL and the other to CRU made the board not boot, probably due to an ATF limitation. Setting the big cores to PVTPLL and the little ones to CRU, or vice versa, seems to work OK.

ThomasKaiser commented 1 year ago

Turns out it's actually PVTPLL, not PVTM

Thanks for the clarification!

That SoC (the HiSilicon Kirin 980) is a 7nm design

Not really. Kirin 980 is made in TSMC's 7N process and RK3588 most probably by Samsung with an 8LPP process name (an extension of Samsung's 10LPP process that is said to have a fin pitch of 42nm and a gate pitch of 64nm. It's not 7nm vs. 8nm but just some marketing numbers with no direct relationship to physics. Or as TSMC’s vice president of corporate research, Dr. Philip Wong, put it when talking about their N7, N5, N3 process names: "These numbers are just numbers. They're like models in a car - it’s like BMW 5-series or Mazda 6."

Anyway: based on these findings wrt upstream settings for mainline kernel based on the huge variance in silicon quality of the various RK3588/RK3588S out there sticking with PVTPLL seems the only option to me. Otherwise the defaults have to be chosen to cope with the 'lowest quality' silicon out there (restricting the A76 to 2.2 GHz for some safety margin) which would negatively impact all the better silicon SoCs out there?

XFer012 commented 1 year ago

Hello, sorry for reviving an old issue. I have an OrangePI 5 and found it is not stable at any clock speed over 2.0 GHz (for the A76 cores). In particular, Geekbench 5.5.1 reboots on the multi-core test (near the end).

This is with the default DVFS OPP of the BSP kernel (5.10.110).

I suspected a bad silicon, but it is not too bad after all:

root@orangepi5:~# dmesg | grep pvtm

[    6.622613] rockchip-pvtm fda40000.pvtm: pvtm@0 probed
[    6.622669] rockchip-pvtm fda50000.pvtm: pvtm@1 probed
[    6.622722] rockchip-pvtm fda60000.pvtm: pvtm@2 probed
[    6.622773] rockchip-pvtm fdaf0000.pvtm: pvtm@3 probed
[    6.622820] rockchip-pvtm fdb30000.pvtm: pvtm@4 probed
[    7.349984] cpu cpu0: pvtm=1488
[    7.350140] cpu cpu0: pvtm-volt-sel=4
[    7.359872] cpu cpu4: pvtm=1725
[    7.364008] cpu cpu4: pvtm-volt-sel=5
[    7.374321] cpu cpu6: pvtm=1731
[    7.378459] cpu cpu6: pvtm-volt-sel=5
[    7.558212] mali fb000000.gpu: pvtm=880
[    7.558279] mali fb000000.gpu: pvtm-volt-sel=3
[    7.589461] RKNPU fdab0000.npu: pvtm=896
[    7.595296] RKNPU fdab0000.npu: pvtm-volt-sel=4

Temperatures are OK: I installed a largish heatsink with 3M thermal adhesive, it does not reach 55 C under load.

Should I try to edit the OPP table somehow? I aim at 2.2 GHz, I know 2.4 is stretching it. But 2.0 seems quite low.

Thanks for any suggestion

Googulator / linux-rk3588-midstream

DVFS OPP not matching downstream RK settings #3