Provide more cpufreq steps for sun8i/legacy

ThomasKaiser commented 8 years ago

Based on the discussion in the forum I would propose adding more cpufreq steps above 816 MHz on sun8i/legacy kernel so that sunxi-cpufreq.c looks like this:

struct cpufreq_frequency_table sunxi_freq_tbl[] = {
    { .frequency = 60000  , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 120000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 240000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 312000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 408000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 480000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 504000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 600000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 648000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 720000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 816000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 864000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 912000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 960000 , .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1008000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1056000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1104000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1152000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1200000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1248000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1296000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1344000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1440000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },
    { .frequency = 1536000, .index = SUNXI_CLK_DIV(0, 0, 0, 0), },

    /* table end */
    { .frequency = CPUFREQ_TABLE_END,  .index = 0,              },
};

This alone should help with performance in situations where throttling occurs since throttling gets more efficient when finer graded jumps between different frequencies are possible. It's already known that Allwinner's defaults in every BSP kernel so far were close to horrible so it's up to us to improve things.

What do you think both regarding the whole approach as well as where to apply (PR against Igor's linux repo or a patch 'as usual')? IMO we should take sun8i as test balloon (though not really necessary since sun50i/A64 has already served as test but the H3 boards with primitive or no programmable voltage regulator would benefit the most from more cpufreqs) and if no complaints are heard should add additional cpufreqs at least to all sunxi legacy kernels and start to test with sun7i and mainline kernel too whether more fine graded throttling there also helps with increasing performance.

Any thoughts?

ThomasKaiser commented 8 years ago

@ssvb I recompiled OpenBLAS and hpl (this time using version 2.2 and not 2.1 as back then again for H3 and now slightly exceed 2.0 GFLOPS on OPi Plus 2E. So either I did something wrong back then or the new version is 'better'. Anyway: I still want to use this linpack to be able to detect undervoltage.

So currently searching for a way to heat up the SoC prior to running Linpack (or starting to understand settings and simply adjust parameters so that a single benchmark run takes 3 or 4 times longer since with the settings from the RPi thread benchmark duration is too short to really create considerable heat).

Anyway: For the test we're currently after testing @zador-blood-stained's in-kernel core-keeper and THS settings the optimized Linpack might be great since a switch between light and heavy load is involved.

zador-blood-stained commented 8 years ago

But it's somewhat time consuming to install it

Nothing special, took only ~40 minutes to compile.

So do I run it in parallel with cpuburn-a7 or it's more complicated? What's the proper testing procedure?

ssvb commented 8 years ago

@ThomasKaiser Sounds like making use of the hardware watchdog built into the SoC might be a good idea for automation. I did use it when automatically tuning DRAM settings for A10/A13/A20. Some modifications might be necessary for H3 though.

ssvb commented 8 years ago

@zador-blood-stained

So do I run it in parallel with cpuburn-a7 or it's more complicated? What's the proper testing procedure?

I have dropped the ball on this front, but IMHO the right way to proceed would be to implement https://github.com/ssvb/cpuburn-arm/issues/4 and improve these tools in general.

ThomasKaiser commented 8 years ago

@zador-blood-stained For now it should be enough to let linpack run on the small boards to check THS and core-keeper stuff (maybe while another lightweight workload is running in parallel to force switch between 1.1V and 1.3V more often -- that would be the goal to test whether bringing back CPU cores from your kernel code might kill boards with current THS settings or not)

@ssvb thanks for mentioning the watchdog. Seems useful for exactly that so in case I get stuck with this (if I try to follow that route since the main problem with such an 'test out hardware reliability' approach is the user in question) I dig deeper.

zador-blood-stained commented 8 years ago

@ThomasKaiser I can always jump between operating points (and thus voltage) manually with cpufreq-set. In case SoC temperature matters here too, I can heat the board with soldering fan :smile:

ThomasKaiser commented 8 years ago

Well, I thought we're still testing whether bringing back CPU cores (too fast) might cause problems since we then reach a critical treshold where an emergency shutdown is initiated since temperature increases again too fast? That's the focus of testing now at least if I understood your concers a while back?

I'm pretty fine already with current settings and would like to see your in-kernel core-keeper being default rather sooner than later :)

zador-blood-stained commented 8 years ago

My concern was that with old settings bringing several cores back (since we had either 4 cores or 1 core in cooler_table) heats SoC so fast that it would trigger emergency shutdown before budget cooling algorithm had time to react to this temperature - and this was on OPi One with defective thermal sensor.

With current settings and "normal" Oranges this shouldn't cause any problems unless somebody decides to use shitty power supply and killing/bringing back cores causes momentary CPU undervoltage.

ssvb commented 8 years ago

@zador-blood-stained The current 1 core state is supposed to be unreachable. The last state with 4 active cores should be already running at a sufficiently low clock speed to handle any load without overheating. If we ever reach the 1 core state, then it's already a catastrophic event similar to thermal shutdown.

ssvb commented 8 years ago

In fact we may revise this last 4 core state by running the lima-textured-cube demo together with cpuburn-a7 and putting the board in a box with poor ventilation :-) The 648MHz CPU clock speed might be too high.

ThomasKaiser commented 8 years ago

@zador-blood-stained OK, now I start to understand. But I'm also sure that I do not understand relationship between cooler_table and THS trip points. On the other hand I don't care that much since I would like to run mainline kernel on H3 devices.

So while playing around with this stuff with legacy kernel to check hardware limits I really hope we get support for THS in mainline kernel soon. @ssvb IIRC megi and you talked a while ago in linux-sunxi IRC about the state of these commits. Have to look through IRC logs to get the idea. I still fear that sending patches upstream gets delayed and we can't benefit from thermal/throttling on H3 boards with mainline kernel before 2017 :\

zador-blood-stained commented 8 years ago

OPi PC, no heatsink, cpuburn-a7

So IMO it's OK to enable corekeeper for all H3 oranges.

ThomasKaiser commented 8 years ago

I'm fine with this but would suggest that we adjust two more values on all SY8106A equipped boards: Increase 1st trip point by 5°C and shutdown treshold also so that we get 10°C between last throttling step and emergency shutdown:

 ths_trip1_0 = 75
 ths_trip1_1 = 80
 ths_trip1_2 = 85
 ths_trip1_3 = 90
 ths_trip1_4 = 95
 ths_trip1_5 = 105

BTW: While we're at it (H3 boards). What about decreasing DRAM clockspeed for both BPi M2+ ~~and NanoPi M1~~ in u-boot and fex file? The test results from yesterday and today do not look that promising when relying on mainline u-boot (Tido also pointed out that DRAM chips on BPi M2+ are slightly different than those Samsungs used on Orange Pis: K4B4G1646D-BCK0 vs. K4B4G1646Q-HYK0 on Oranges -- I asked Tido to correct this in linux-sunxi wiki)

zador-blood-stained commented 8 years ago

Increase 1st trip point by 5°C and shutdown treshold also so that we get 10°C between last throttling step and emergency shutdown

You need to adjust second part of THS table, which defines cooling states, too.

What about decreasing DRAM clockspeed for both BPi M2+ and NanoPi M1 in u-boot and fex file?

Don't have any of these boards to test, but if there are 2 or more cases of failing lima-memtester tests with current DRAM speed (or current + 24MHz), then it's better safe than sorry I guess.

zador-blood-stained commented 8 years ago

BTW, Do we need any more tests without heatsinks? I think I have enough small heatsinks and adhesive stuff for all new boards.

ThomasKaiser commented 8 years ago

You need to adjust second part of THS table, which defines cooling states, too.

Really? I just want to modify the first and last entry therefore letting the first throttling step happen 5°C higher than before and get some safety headroom regarding emergency shutdowns on the upper end of the thermal scala. IMO adjusting both temperature values should be enough?

Regarding tests without heatsink IMO only confirming thermal behaviour of OPi Plus 2E would be interesting since I'm still amazed how less throttling here occurs. So just one graph with our current settings and information regarding ambient temperature would be fine (still preparing a side-by-side review of BPi M2+ and OPi Plus 2E)

ssvb commented 8 years ago

IIRC, the emergency shutdown temperature is configured by the ths_trip2_0 = 105 line in FEX.

ThomasKaiser commented 8 years ago

IIRC, the emergency shutdown temperature is configured by the ths_trip2_0 = 105 line in FEX.

Sure, but reaching the last _thstrip1 entry has already the same effect. So in case ths_trip1_count = 8 is defined reaching ths_trip1_7 will also trigger a shutdown.

zador-blood-stained commented 8 years ago

Regarding tests without heatsink IMO only confirming thermal behaviour of OPi Plus 2E would be interesting

Opi Plus 2E, cpuburn-a7, no heatsink

Most of the time spent at 1008MHz

ThomasKaiser commented 8 years ago

Thx for the test. I just merged #340 so from now on we have identical THS settings on all H3 boards. I also asked for more testers regarding BPi M2+ (both DRAM reliability as well as thermal readouts since the results I got so far are simply weird or an indication that this board overheats like hell).

Hopefully a few more users get back to us soon. As long as this is unresolved we should clock DRAM with 624 MHz as on the other H3 boards already.

I'm currently preparing an article regarding H3 boards and performance tuning, eg. analysing own workload and thermal behaviour and then tuning THS settings so that throttling will happen more fine grained in the approriate thermal range (then really making use of the more cpufreq operating points we added).

IMO no more tests in this area (and w/o heatsinks) necessary :)

armbian / build

Provide more cpufreq steps for sun8i/legacy #298