armbian / build

Armbian Linux build framework generates custom Debian or Ubuntu image for x86, aarch64, riscv64 & armhf
https://www.armbian.com
GNU General Public License v2.0
4.01k stars 2.26k forks source link

Provide more cpufreq steps for sun8i/legacy #298

Closed ThomasKaiser closed 8 years ago

ThomasKaiser commented 8 years ago

Based on the discussion in the forum I would propose adding more cpufreq steps above 816 MHz on sun8i/legacy kernel so that sunxi-cpufreq.c looks like this:

struct cpufreq_frequency_table sunxi_freq_tbl[] = {
    { .frequency = 60000  , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 120000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 240000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 312000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 408000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 480000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 504000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 600000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 648000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 720000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 816000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 864000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 912000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 960000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1008000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1056000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1104000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1152000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1200000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1248000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1296000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1344000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1440000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1536000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },

    /* table end */
    { .frequency = CPUFREQ_TABLE_END,  .index = 0,              },
};

This alone should help with performance in situations where throttling occurs since throttling gets more efficient when finer graded jumps between different frequencies are possible. It's already known that Allwinner's defaults in every BSP kernel so far were close to horrible so it's up to us to improve things.

What do you think both regarding the whole approach as well as where to apply (PR against Igor's linux repo or a patch 'as usual')? IMO we should take sun8i as test balloon (though not really necessary since sun50i/A64 has already served as test but the H3 boards with primitive or no programmable voltage regulator would benefit the most from more cpufreqs) and if no complaints are heard should add additional cpufreqs at least to all sunxi legacy kernels and start to test with sun7i and mainline kernel too whether more fine graded throttling there also helps with increasing performance.

Any thoughts?

zador-blood-stained commented 8 years ago

While adding more steps may not improve things a lot for interactive governor, it still may be useful for ondemand and conservative (yes, I saw your post on forum about switching latency, and in any case switching governors is a tradeoff between performance and power saving).

What do you think both regarding the whole approach as well as where to apply (PR against Igor's linux repo or a patch 'as usual')?

Igor's repo doesn't have any previous commits history, so I think a patch "as usual" will be fine. I added patch with your table to my repo.

zador-blood-stained commented 8 years ago

and if no complaints are heard should add additional cpufreqs at least to all sunxi legacy kernels and start to test with sun7i and mainline kernel too whether more fine graded throttling there also helps with increasing performance.

I thought that it's not easy to heat up A10/A20 to the point of throttling without using synthetic loads like cpuburn.

ssvb commented 8 years ago

@zador-blood-stained FYI, here is an explanation why interactive is better than ondemand: https://lkml.org/lkml/2012/2/7/483

ssvb commented 8 years ago

@zador-blood-stained

I thought that it's not easy to heat up A10/A20 to the point of throttling without using synthetic loads like cpuburn.

There are real workloads, which are only a little bit less power hungry than cpuburn. A10 and A20 don't have 4 cores, but they are also less power efficient than H3 and need higher core voltage. Still, as far as I know, currently A10 and A20 don't implement any thermal throttling at all and the current DVFS settings are almost on the edge of being overheating prone.

zador-blood-stained commented 8 years ago

here is an explanation why interactive is better than ondemand

Here "better" is still subjective and depends on types of workloads and whole system use case.

This governor is designed for latency-sensitive workloads, such as interactive user interfaces.

So for battery saving (for A10,A20 and A64) other governors may still be better.

A10 and A20 don't have 4 cores, but they are also less power efficient than H3 and need higher core voltage.

Also they don't have precise enough thermal sensor to base thermal throttling on.

ssvb commented 8 years ago

Here "better" is still subjective and depends on types of workloads and whole system use case. So for battery saving (for A10,A20 and A64) other governors may still be better.

And this is based on what? Yes, I know that some people think that "the ondemand governor is supported in the mainline kernel, and everything that is included in the mainline kernel can't be bad by definition" :-)

But the ondemand governor is just a horrible piece of code. It is based on the "waking up to decide whether the CPU is idle" concept. And having unnecessary periodic wakeups is exactly the thing that ruins battery life. This governor is very clearly not fit for the job and Android people had no choice but to replace it with something else. There is nothing like "tradeoffs" here, the ondemand is just inferior in every possible way.

That said, the work is being done in the mainline kernel to clean up this ugly mess, see http://lkml.iu.edu/hypermail/linux/kernel/1603.1/05278.html and http://marc.info/?l=linux-acpi&m=145814049919895&w=2

zador-blood-stained commented 8 years ago

I know that some people think that "the ondemand governor is supported in the mainline kernel, and everything that is included in the mainline kernel can't be bad by definition" :-)

I'm not one of them.

And having unnecessary periodic wakeups is exactly the thing that ruins battery life.

Now I see your point. Patch note doesn't focus on this aspect. I based my comparison on assumption that power consumption scales non-linearly with frequency, so staying longer at lower frequencies is more efficient.

ThomasKaiser commented 8 years ago

While adding more steps may not improve things a lot for interactive governor

Hmm... I've been talking about throttling already happening. With the H3 BSP kernel it's not that bad as it was with A64 since here we currently have 96 MHz steps (with the one strange expection before @ssvb added the 1296 MHz cpufreq) but providing the ability to use 48 MHz steps will increase performance in throttling situations for sure (regardless of governor used since throttling frequency then defines maximum cpufreq)

Regarding A20: mea culpa. This isn't a real throttling candidate but I was already thinking about A20E (based on .dts stuff available through A64 BSP) and thought about checking throttling activity with mainline instead of BSP kernel just to realize that things are moving (reading through ssvb's links right now)

Thx for adopting changes that fast. And am really looking forward to sun8i-simple-cpu-corekeeper.patch.disabled. What's missing? Testing?

zador-blood-stained commented 8 years ago

And am really looking forward to sun8i-simple-cpu-corekeeper.patch.disabled. What's missing? Testing?

[ 532.981703] thermal_sys: Critical temperature reached(100 C),shutting down

Needs more work. cpuburn-a7 kills OPi One with heatsink in 5-10 minutes with current trip points configuration.

ThomasKaiser commented 8 years ago

cpuburn-a7 kills OPi One with heatsink in 5-10 minutes with current trip points configuration.

Well, I think since 2 months that there's a lot of room for improvements based on ssvb's comments about strange cooling maps nodes in A64 BSP kernel settings. But I lack the skills...

zador-blood-stained commented 8 years ago

With killing and powering back cores one by one and extended DVFS table in-kernel corekeeper (new version) works stable enough (cpuburn-a7 running for an hour never killed more than 2 cores)

[cooler_table]
cooler_count = 6
cooler0 = "1200000 4 4294967295 0"
cooler1 = "1008000 4 4294967295 0"
cooler2 = "648000 4 4294967295 0"
cooler3 = "600000 3 4294967295 0"
cooler4 = "504000 2 4294967295 0"
cooler5 = "480000 1 4294967295 0"
[dvfs_table]
pmuic_type = 1
pmu_gpio0 = port:PL06<1><1><2><1>
pmu_level0 = 11300
pmu_level1 = 1100
max_freq = 1200000000
min_freq = 480000000
LV_count = 12
LV1_freq = 1200000000
LV1_volt = 1300
LV2_freq = 1104000000
LV2_volt = 1300
LV3_freq = 1056000000
LV3_volt = 1300
LV4_freq = 100800000
LV4_volt = 1300
LV5_freq = 960000000
LV5_volt = 1300
LV6_freq = 912000000
LV6_volt = 1100
LV7_freq = 816000000
LV7_volt = 1100
LV8_freq = 720000000
LV8_volt = 1100
LV9_freq = 648000000
LV9_volt = 1100
LV10_freq = 600000000
LV10_volt = 1100
LV11_freq = 504000000
LV11_volt = 1100
LV12_freq = 480000000
LV12_volt = 1100

but killing cores earlier may affect both performance in CPU-intensive tasks and benchmarking results.

zador-blood-stained commented 8 years ago

image

Since this graph doesn't tell the real picture, during an hour of testing, 3rd core was killed 20 times and 4th core was killed 303 times.

ThomasKaiser commented 8 years ago

Hmm... according to the graph H3 was most of the times running at 600 MHz where it's already allowed to kill CPU cores. Another approach would be to allow further throttling down to lower frequencies without killing cores and also increasing some trip points (IMO it's fine to exceed 90°C under full load).

BTW: Did you check _VDDCPUX voltage in this test? Really at 1.1V all the time?

zador-blood-stained commented 8 years ago

Another small test - with these settings and corekeeper disabled cpuburn kills only one core in ~10 minutes - that's enough to keep temperature down at 648MHz image

BTW: Did you check VDD_CPUX voltage in this test? Really at 1.1V all the time?

There was small peak at ~19:46 where it jumped to 1.3 with frequency going up, besides that it stayed at 1.1. Or do you want me to measure the voltage at test points?

Hmm... according to the graph H3 was most of the times running at 600 MHz where it's already allowed to kill CPU cores.

Killing bringing back cores at higher frequencies may create too big of a temperature increase where it would trigger auto shutdown. And situation without heatsink may be even worse.

ThomasKaiser commented 8 years ago

I was really talking about measuring since the graphs are based on parsing script.bin and then displaying VDD_CPUX only according to dvfs fex settings. And on OPi One it's not even possible to query SY8106A for the voltage really used.

zador-blood-stained commented 8 years ago

Here I was already talking about measured voltages (1.13V and 1.33V), relative to one of GND pins on GPIO header.

Edit: Even though my multimeter is relatively cheap and old, I tested it on REF01CPZ voltage reference, and it should be precise enough for measuring DC voltage.

ThomasKaiser commented 8 years ago

I know but since I never looked into the driver I simply have no trust at all in the readouts. And temperatures appear to be pretty high compared to the stuff I measured with OPi PC so far.

I still fail to understand why OPi One differs that much compared to PC here (since it should be related to VDD_CPUX voltage and workload -- same settings, same results)

zador-blood-stained commented 8 years ago

I still fail to understand why OPi One differs that much compared to PC here (since it should be related to VDD_CPUX voltage and workload -- same settings, same results)

Maybe board size matters, bigger PCB means bigger surface area for heat dissipation and bigger volume for heat accumulation and smoothing fast temperature changes.

zador-blood-stained commented 8 years ago

@ThomasKaiser So what is better in your opinion (I mean as future default settings for general use) - more cores running at low frequency or less cores running at high frequency? Obviously multithreaded tasks will benefit from first option and single-threaded will benefit from second option.

ssvb commented 8 years ago

Single-threaded tasks are unlikely to trigger thermal throttling in the first place, so more cores running at low frequency seems to be a universally good choice.

ssvb commented 8 years ago

Theoretically there could be single-threaded GPU heavy workloads, but then the budget cooling needs to take the GPU into account properly. Which is a part of the budget cooling design in principle, but I'm not sure if it is implemented correctly in Allwinner BSP kernels yet.

zador-blood-stained commented 8 years ago

I'm not sure if it is implemented correctly in Allwinner BSP kernels yet.

Well, quick grepping through kernel source shows that there is some sort of implementation: this in theory should call that or that if all is configured correctly.

ThomasKaiser commented 8 years ago

Single-threaded tasks are unlikely to trigger thermal throttling in the first place, so more cores running at low frequency seems to be a universally good choice.

I agree for the same reason. And even in case there are many unrelated single-threaded workloads running in parallel that lead to a throttling situation keeping CPU cores while reducing clockspeed (and when we're talking about systems that implement dvfs at the same time also VDD_CPUX!) is the better option since we end up with more overall performance.

I tested 2 cores running at 1200 MHz vs. 4 cores running at 600 MHz back in December on OPi PC and both temperatures/consumption were lower when running with full core count at half the speed. Would mean at the same consumption/temperature level higher clockspeeds would be possible (720 MHz or maybe even 816 MHz)

BTW: The more I think about this stuff looking from OPi One/Lite and especially SinoVoip's M2+ perspective (at 1.3V all the time) the more I come to the conclusion that the intermediate steps between 480 MHz and 816 MHz should better look like

{ .frequency = 528000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 576000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 624000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 672000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 720000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 768000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },

(currently there are just 4 clockspeeds defined that aren't following the 48MHz step rule)

zador-blood-stained commented 8 years ago

If I understand things correctly, adding more frequencies to the driver won't help if you don't define operating points in FEX file, and current limit is 16 operating points.

ThomasKaiser commented 8 years ago

Hmm... just had a look through screenshots taken (with wrong voltage assumptions) since I can not test currently:

orange_pi_one_comparison

Seems like I only used 2 dvfs entries in the fex file but more intermediate steps were used. But even if we have to deal with 16 dvfs operating points max it shouldn't be a problem at all since the boards with primitive or no programmable voltage regulator end up with (translated to dvfs fex entries of course)

{ .frequency = 240000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 408000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 480000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 576000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 672000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 720000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 768000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 816000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 864000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 912000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 960000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1008000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1056000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1104000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1152000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1200000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },

and the ones with SY8106A with

{ .frequency = 480000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 576000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 672000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 720000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 768000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 816000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 864000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 912000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 960000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1008000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1056000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1104000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1152000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1200000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1248000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
{ .frequency = 1296000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
zador-blood-stained commented 8 years ago

Seems like I only used 2 dvfs entries in the fex file but more intermediate steps were used.

Yes, you are right. Output of cpufreq-info lists frequencies from driver and not from DVFS table.

But even if we have to deal with 16 dvfs operating points max

This can be patched to allow more, but we don't have more than 16 voltage options in any case.

ThomasKaiser commented 8 years ago

This can be patched to allow more, but we don't have more than 16 voltage options in any case.

But if I understand correctly there's nothing to patch? We could replace the 4 cpufreq entries now with the 7 in 48 MHz steps as proposed above and all we have to do is to exchange 648 MHz with 672 here (or switching back to 2 dvfs operating points now with 816 MHz as treshold and not 648 MHz as before -- this was based on wrong assumptions by me back then)

BTW: I always looked into /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans instead cpufreq-info output.

zador-blood-stained commented 8 years ago

Updated

BTW: I always looked into /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans instead cpufreq-info output.

I didn't set up zsh and history-based autocompletion on OPi One, so cpufreq-info is easier to remember and faster to type :smile:

zador-blood-stained commented 8 years ago

New settings: Reduced DVFS table + new cooler table

[dvfs_table]
pmuic_type = 1
pmu_gpio0 = port:PL06<1><1><2><1>
pmu_level0 = 11300
pmu_level1 = 1100
max_freq = 1200000000
min_freq = 480000000
LV_count = 4
LV1_freq = 1200000000
LV1_volt = 1300
LV2_freq = 960000000
LV2_volt = 1300
LV3_freq = 912000000
LV3_volt = 1100
LV4_freq = 480000000
LV4_volt = 1100
[cooler_table]
cooler_count = 6
cooler0 = "1200000 4 4294967295 0"
cooler1 = "912000 4 4294967295 0"
cooler2 = "768000 4 4294967295 0"
cooler3 = "720000 3 4294967295 0"
cooler4 = "600000 2 4294967295 0"
cooler5 = "504000 1 4294967295 0"

cpuburn-a7 running for an hour, first 10 minutes with active cooling (fan). Strangely, no CPU cores were killed this time (CPU was running at 768MHz mostly) image

ThomasKaiser commented 8 years ago

I would call this success and would also use these settings with 5.11 for One/Lite/M1 :)

On M2+ the corekeeper patch should help a lot (did you already disable installation of the ugly core-keeper.sh hack on sun8i?)

zador-blood-stained commented 8 years ago

I would call this success and would also use these settings with 5.11 for One/Lite/M1 :)

After someone anyone tests it without a heatsink

On M2+ the corekeeper patch should help a lot (did you already disable installation of the ugly core-keeper.sh hack on sun8i?)

No. In-kernel corekeeper is not active by default - as I said earlier, with old cooler_table settings on OPi One it kills the board in ~5-10 minutes, so it needs to be tested before activating, board by board. And I didn't have core-keeper.sh installed since i'm building all images with EXTERNAL=no by default.

To activate it

[corekeeper]
corekeeper_enabled = 1

needs to be added to FEX file

ThomasKaiser commented 8 years ago

Agreed. I just asked our users for help. It would be great if you could provide an archive with the kernel .debs so that users willing to test do not have to go through the whole process of setting up and using our build system.

Corrections welcome if I missed something or wrote nonsense!

zador-blood-stained commented 8 years ago

I posted link to a prebuilt kernel on forum. BTW, similar stuff appears to be working on Pine64 - I changed frequency table and cooler table in DT for plus model. And corekeeper is working too. Before:

available frequency steps: 480 MHz, 600 MHz, 720 MHz, 816 MHz, 912 MHz, 960 MHz, 1.01 GHz,
1.06 GHz, 1.10 GHz, 1.15 GHz, 1.20 GHz, 1.34 GHz

image After:

available frequency steps: 408 MHz, 480 MHz, 504 MHz, 528 MHz, 576 MHz, 600 MHz, 624 MHz,
648 MHz, 672 MHz, 720 MHz, 768 MHz, 816 MHz, 864 MHz, 912 MHz, 960 MHz, 1.01 GHz, 1.06 GHz, 
1.10 GHz, 1.15 GHz, 1.20 GHz, 1.34 GHz

image

longsleep commented 8 years ago

Did you measure already that these additional low frequencies, do give an actual speedup?

ThomasKaiser commented 8 years ago

I would believe we don't need more steps below 600 MHz since this is where we end up with cpuburn-a53 on a Pine64 without heatsink. But who knows, according to A64 user manual there are 3 thermal sensors (2 for GPU and 1 for CPU) so in case GPU is also active cpufreq might further decrease.

But more fine grained cpufreq steps above 600 MHz should help a little since throttling might get more efficient. I found cpuminer nice to measure actual performance improvements/degradation. Available inside an archive I prepared a while back with a script that provides khash/s data source for RPi-Monitor and adjusted template.

ThomasKaiser commented 8 years ago

I propose using these settings now for BPi M2+:

[ths_para]
ths_used = 1
ths_trip1_count = 8
ths_trip1_0 = 60
ths_trip1_1 = 75
ths_trip1_2 = 85
ths_trip1_3 = 90
ths_trip1_4 = 95
ths_trip1_5 = 97
ths_trip1_6 = 99
ths_trip1_7 = 105
ths_trip1_0_min = 0
ths_trip1_0_max = 1
ths_trip1_1_min = 1
ths_trip1_1_max = 2
ths_trip1_2_min = 2
ths_trip1_2_max = 3
ths_trip1_3_min = 3
ths_trip1_3_max = 4
ths_trip1_4_min = 4
ths_trip1_4_max = 6
ths_trip1_5_min = 6
ths_trip1_5_max = 8
ths_trip1_6_min = 8
ths_trip1_6_max = 10
ths_trip1_7_min = 0
ths_trip1_7_max = 0
ths_trip2_count = 1
ths_trip2_0 = 105

[cooler_table]
cooler_count = 11
cooler0 = "1200000 4 4294967295 0"
cooler1 = "912000 4 4294967295 0"
cooler2 = "720000 4 4294967295 0"
cooler3 = "648000 4 4294967295 0"
cooler4 = "576000 4 4294967295 0"
cooler5 = "480000 4 4294967295 0"
cooler6 = "312000 4 4294967295 0"
cooler7 = "240000 4 4294967295 0"
cooler8 = "240000 3 4294967295 0"
cooler9 = "240000 2 4294967295 0"
cooler10 = "240000 1 4294967295 0"

[corekeeper]
corekeeper_enabled = 1

And for OPi One/Lite and NanoPi M1 I would suggest using

[ths_para]
ths_used = 1
ths_trip1_count = 6
ths_trip1_0 = 70
ths_trip1_1 = 80
ths_trip1_2 = 85
ths_trip1_3 = 90
ths_trip1_4 = 95
ths_trip1_5 = 105
ths_trip1_6 = 0
ths_trip1_7 = 0
ths_trip1_0_min = 0
ths_trip1_0_max = 1
ths_trip1_1_min = 1
ths_trip1_1_max = 2
ths_trip1_2_min = 2
ths_trip1_2_max = 3
ths_trip1_3_min = 3
ths_trip1_3_max = 4
ths_trip1_4_min = 4
ths_trip1_4_max = 5
ths_trip1_5_min = 5
ths_trip1_5_max = 7
ths_trip1_6_min = 0
ths_trip1_6_max = 0
ths_trip2_count = 1
ths_trip2_0 = 105

[cooler_table]
cooler_count = 8
cooler0 = "1200000 4 4294967295 0"
cooler1 = "912000 4 4294967295 0"
cooler2 = "768000 4 4294967295 0"
cooler3 = "648000 4 4294967295 0"
cooler4 = "480000 4 4294967295 0"
cooler5 = "480000 3 4294967295 0"
cooler6 = "480000 2 4294967295 0"
cooler7 = "480000 1 4294967295 0"

[corekeeper]
corekeeper_enabled = 1

[dvfs_table]
pmuic_type = 1
pmu_gpio0 = port:PL06<1><1><2><1>
pmu_level0 = 11300
pmu_level1 = 1100
max_freq = 1200000000
min_freq = 480000000
LV_count = 4
LV1_freq = 1200000000
LV1_volt = 1300
LV2_freq = 960000000
LV2_volt = 1300
LV3_freq = 912000000
LV3_volt = 1100
LV4_freq = 480000000
LV4_volt = 1100

BTW: The 2 test results from forum without heatsink running at 912/768 MHz seem really to indicate a problem of the Orange Pi One we both test(ed) with :)

zador-blood-stained commented 8 years ago

As for cooler table for Pine64 - I'm still experimenting, and since it seems to be directly connected to THS trip points, I'll probably try another set of settings later.

ThomasKaiser commented 8 years ago

@zador-blood-stained : What do you think about last OPi One test results submitted? IMO it's OK to increase both cooler_table values and trip points.

zador-blood-stained commented 8 years ago

IMO it's OK to increase both cooler_table values and trip points.

But not by much, here we have peaks at 97°C, which is rather high and may cause emergency shutdown if throttling is not fast enough (especially if we have GPU load too, which we aren't testing now).

Also I think that OPi One that I have suffers from thermal sensor issues and not from overvoltage (since actual voltage is OK), so when we figure out "optimal" settings, it's better to update linux-sunxi wiki.

ThomasKaiser commented 8 years ago

Agreed that thermal readouts of your OPi One are obviously wrong (but I managed to run the hardware reliability tests back then with 1200 MHz @ 1.1V successfully -- there my wild guesses originate from).

But how to proceed? OK, we need more testing.

But what about increasing the shutdown treshold to 105°C and activate your in-kernel corekeeper now? And maybe also increase cooler table frequencies (912MHz/1.1V --> 1008MHz/1.3V as first step). But regarding the latter I'm not sure -- that would require more testing if it's not counterproductive. But unfortunately results will always vary depending on environmental settings (heatsink or not, enclosure or not and so on) so maybe 'optimal' settings are just an illusion...

Well, my main point for now is to get your in-kernel corekeeper into 5.11 release and also relax shutdown situations a bit by increasing the critical treshold so we might be able to collect more feedback afterwards.

zador-blood-stained commented 8 years ago

I'm in favor of keeping shutdown temperature at 100°C, spacing between THS points of at least 5°C and killing one core at a time. Since default max frequency is 1200MHz, cooler0 frequency should be 1008MHz, next should be 912MHz and so on with spacing of ~100MHz between cooler table frequencies.

ThomasKaiser commented 8 years ago

@zador-blood-stained did you receive your boards from Xunlong already?

zador-blood-stained commented 8 years ago

@ThomasKaiser Nothing yet

ThomasKaiser commented 8 years ago

Ok, then I will postpone any PRs adjusting ths/cooler table settings for now.

I made many tests, just to realize that Xunlong obviously exchanged the PCB material on the 3 new boards (thicker and spreads heat more efficiently).

There is little room for improvements regarding the SY8106A based boards but that's stuff for a (wiki) article, on One/Lite we might want to increase trip points to better match the switch from 1.1V to 1.3V and regarding the only board without any programmable voltage regulator that also shows poor heat dissipation (BPi M2+) I'm a bit clueless how to proceed. I've some ths/cooler table settings that show better behaviour when using cpuburn-a7 and cpuminer in parallel. But this is no realistic workload.

Maybe that's also stuff for documentation (don't buy this board and if you do so, don't expect full performance over longer periods of time)?

Anyway: I would increase the trip point where an emergency shutdown occurs to 105°C and define the last throttling trip point 10°C lower (or increase both a bit and use 100°C for the last throttling trip point and 110°C for emergency shutdown).

Anyway: I leave it up to you. With increased emergency shutdown temperature I didn't managed to shutdown OPi Lite with your activated in-kernel core-keeper. IMO we should activate your code ASAP. :)

zador-blood-stained commented 8 years ago

New OPi One, no heatsink, current default settings image

ThomasKaiser commented 8 years ago

Hmm... since I assume 'Active CPUs' is 4 this means cpufreq jumps between 816 and 912 MHz and voltage at 1.1V all the time? And you're still running cpuburn-a7?

zador-blood-stained commented 8 years ago

Yes, this is cpuburn-a7. Frequency was jumping between 768 and 912MHz, so voltage stayed at 1.1V.

ThomasKaiser commented 8 years ago

Would be interesting to test this also with a less demanding workload (eg cpuminer and escpecially Linpack) to watch behaviour when voltage starts to jump between 1.1V and 1.3V.

Linpack starts pretty soft the first x seconds to increase load then dramatically. But it's somewhat time consuming to install it:

At least when trying to optimise dvfs settings on Pine64 it was worth the efforts since it pretty reliably detected undervoltage situations (I started a few days ago with a script for users to automagically improve dvfs/cpufreq settings on their specific H3 board just to realise that my attempt to heat up the SoC prior to linpack run with cpuburn-a7 doesn't work since when the SoC is already undervolted cpuburn-a7 kills it reliably -- still searching for a better way).

Anyway: A workload that let the board jump between the two voltages would be great to test since we're interested whether temperature increase could be critical with the new settings.

Looking at cpuburn-a7 alone above I'm pretty happy already :)

zador-blood-stained commented 8 years ago

But it's somewhat time consuming to install it:

This with this compilation command isn't it?

ssvb commented 8 years ago

@zador-blood-stained What you are referring to is a toy-grade Linpack, which uses a simplistic naive algorithm with poor memory locality and has no assembly optimizations. It is demonstrating laughable GFLOPS numbers too.

The true Linpack is a bit more complex piece of software, which relies on a highly optimized OpenBLAS library. As such, it also happens to be pretty stressful for the hardware and is sensitive to undervoltage conditions.