ThomasKaiser / sbc-bench

Simple benchmark for single board computers
BSD 3-Clause "New" or "Revised" License
675 stars 78 forks source link

System sudden death when setting unsupported governor #62

Closed RadxaYuntian closed 1 year ago

RadxaYuntian commented 1 year ago

Testing out -r mode on our RK3566 based product.

At line 709 the script matched to rkvenc's governor:

radxa@rock-3c:~$ Governors="$(find /sys -name "*governor" | grep -E -v '/sys/module|cpuidle|watchdog')"
find: ‘/sys/kernel/tracing’: Permission denied
find: ‘/sys/kernel/debug’: Permission denied
find: ‘/sys/fs/pstore’: Permission denied
find: ‘/sys/fs/bpf’: Permission denied
find: ‘/sys/fs/fuse/connections/35’: Permission denied
radxa@rock-3c:~$ echo $Governors 
/sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/governor /sys/devices/platform/fde60000.gpu/devfreq/fde60000.gpu/governor /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

This node provides AvailableGovernorsSysFSNode:

radxa@rock-3c:~$ ls /sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/
available_frequencies  device    max_freq  polling_interval  target_freq
available_governors    governor  min_freq  power             trans_stat
cur_freq               load      name      subsystem         uevent
radxa@rock-3c:~$ cat /sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/available_governors 
venc_ondemand simple_ondemand

However, since its SysFSNode does not contain the word cpufreq, it is getting the default governor resetting treatment.

I have confirmed that echo powersave | sudo tee /sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/governor can reliably bring our device down.

ThomasKaiser commented 1 year ago

Oh, that's bad. Most probably 'bad' as in 'bad kernel code not dealing correctly with Linux semantics'?

But what about rkvdec? So far I've not seen this sysfs node with RK3566 devices but only RK3228A, RK3229 and RK3568. Can you please give it a try on one of your Rock 3A with the respective sysfs node set to powersave so I can exclude both in a batch?

ThomasKaiser commented 1 year ago

Most probably 'bad' as in 'bad kernel code not dealing correctly with Linux semantics'?

Or maybe 'bad' as in 0 microvolts when setting the thing to powersave? Currently the OPP tables (. sbc-bench.sh ; ParseOPPTables) look like this:

bus-npu-opp-table:
    900 MHz      0.0 mV
   1000 MHz    950.0 mV

cpu0-opp-table:
    408 MHz    825.0 mV
    600 MHz    825.0 mV
    816 MHz    825.0 mV
   1104 MHz    825.0 mV
   1416 MHz    925.0 mV
   1608 MHz   1000.0 mV
   1800 MHz   1050.0 mV

dmc-opp-table:
   1560 MHz    900.0 mV

npu-opp-table:
    200 MHz    825.0 mV
    297 MHz    825.0 mV
    400 MHz    825.0 mV
    600 MHz    825.0 mV
    700 MHz    850.0 mV
    800 MHz    875.0 mV
    900 MHz    925.0 mV
   1000 MHz   1000.0 mV

opp-table2:
    200 MHz    825.0 mV
    300 MHz    825.0 mV
    400 MHz    825.0 mV
    600 MHz    825.0 mV
    700 MHz    900.0 mV
    800 MHz    950.0 mV

rkvenc-opp-table:
    297 MHz      0.0 mV
    400 MHz    950.0 mV

What happens if you set it to 925000 for example?

RadxaYuntian commented 1 year ago

I don't think they are capable of perpetual motion so I'll give that a go. Also fix bus-npu-opp-table while we are at it.

I'll check rkvdec later on 3A.

RadxaYuntian commented 1 year ago

Adding voltage to rkvenc-opp-table fixed the issue. I'll get back to you in our forum PM.

ThomasKaiser commented 1 year ago

Great, so -r already helped finding/fixing at least one bad/broken setting :)