agherzan / meta-raspberrypi

Yocto/OE BSP layer for the Raspberry Pi boards
https://www.yoctoproject.org/
MIT License
525 stars 407 forks source link

RPi 3B: Bad Yocto performance, not using all 4 cores? #1224

Closed hungerpirat closed 1 year ago

hungerpirat commented 1 year ago

Description Context: We're porting our app from Raspbian Buster to Yocto Kirkstone. On Yocto it runs significantly slower, therefore we did some benchmarks comparing these two operating systems.

Steps to reproduce the issue:

  1. Run the CPU benchmark from https://www.passmark.com/downloads/pt_linux_arm32.zip on Yocto Kirkstone and Raspbian Buster.
  2. Compare the results.
  3. Compare the hardware summary of the benchmark tool.

Describe the results you received: Benchmark results: Yocto on the left, Raspbian on the right: yocto-raspbian-pt_linux_arm32

Describe the results you expected: Both systems should have similar results. Yocto is outperformed on most metrics by factor 2 to 3.

Additional information you deem important (e.g. issue happens only occasionally): The hardware summary of this benchmark gives a hint: Yocto: 1 core @ 1200 MHz Raspbian: 4 cores @ 1200 MHz

Additional details (revisions used, host distro, etc.):

The test details: Yocto: $> cat /etc/os-release ID=poky NAME="Poky (Yocto Project Reference Distro)" VERSION="4.0.8 (kirkstone)" VERSION_ID=4.0.8 PRETTY_NAME="Poky (Yocto Project Reference Distro) 4.0.8 (kirkstone)" DISTRO_CODENAME="kirkstone"

Raspbian: $> cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=raspbian ID_LIKE=debian

Hardware: $> cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4

processor : 1 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4

processor : 2 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4

processor : 3 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4

Hardware : BCM2835 Revision : a22082 Serial : 0000000014c04e46 Model : Raspberry Pi 3 Model B Rev 1.2

hungerpirat commented 1 year ago

Test setup is still in place, I am happy to assist with more tests if necessary.

agherzan commented 1 year ago

Are you running the same test binary? Can you also check the CPU governor?

hungerpirat commented 1 year ago

Checked it: Yocto: $> cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor powersave powersave powersave powersave

Raspbian: $> cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ondemand ondemand ondemand ondemand

hungerpirat commented 1 year ago

... @agherzan thanks for your prompt reply. And yes, on both systems I used the same test binary. (On the raspbian I had to install libncurses5 to run it first).

agherzan commented 1 year ago

Have you tried using the same governor?

hungerpirat commented 1 year ago

just trying... results coming soon

hungerpirat commented 1 year ago

scaling_governor = performance: image

scaling_governor = ondemand image

hungerpirat commented 1 year ago

looks good so far - thanks. I'll countercheck once more with sysbench as the number of cores still is suspicious to me.

agherzan commented 1 year ago

In your initial report, you have the same mount of cores. Most probably a tool issue. I'll close it, as it doesn't seem that you figured it out.

hungerpirat commented 1 year ago

Yes @agherzan you are right. This is the main issue of the performance difference. Just one heads up - I re-ran sysbench, which showed a different thread performance recently. And still, the thread performance is way different:

sysbench --test=threads --num-threads=4 run : image

Do you think this difference in performance might be a configuration issue?

agherzan commented 1 year ago

There could definitely be some configuration at play here. Have you checked the firmware config.txt on RasberrypiOS? Just a stab in the dark.

hungerpirat commented 1 year ago

Thanks again. So far I did not find anything suspicious in this config.txt nor in the kernel configs of both - but I have to confess that I am not an expert in kernel configs. For the record, the kernel configs are attached. kernel_config_raspbian.txt kernel_config_yocto.txt

hungerpirat commented 1 year ago

another noteworthy discovery: the default scaling governor from raspbian is set to powersave in the kernel config. During boot up it is set to ondemand in an init script.

kraj commented 1 year ago

Thanks again. So far I did not find anything suspicious in this config.txt nor in the kernel configs of both - but I have to confess that I am not an expert in kernel configs.

Are you building 32bit OS ? if so, can you find out if raspbian you are using is built for armv6l/thumb1 or armv7l/thumb2 ISA

readelf -a <path/to/sysench>

should show it.

kraj commented 1 year ago

Thanks again. So far I did not find anything suspicious in this config.txt nor in the kernel configs of both - but I have to confess that I am not an expert in kernel configs.

Are you building 32bit OS ? if so, can you find out if raspbian you are using is built for armv6l/thumb1 or armv7l/thumb2 ISA

readelf -a <path/to/sysench>

should show it.

also disable ksan and bpf in yocto kernel create a config fragment like yocto.cfg and add it to SRC_URI += "file://yocto.cfg

CONFIG_HAVE_ARCH_KASAN=n
CONFIG_STACKPROTECTOR_PER_TASK=n
CONFIG_HOTPLUG_CPU=n
CONFIG_SCHED_MC=n
CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE=n
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
hungerpirat commented 12 months ago

Thanks for your hints @kraj - here's the output of the readelf cmd. Both are build for 32bit. It looks like the raspbian is build for armv6/thumb-1 while yocto is armv7thumb-2. Unfortunately the meaning of thsi is far beyond my knowledge. How does this affect threading performance?

(And... a test with a kernel configured like you suggested still has to be done.)

readelf-sysbench-raspbian.txt readelf-sysbench-yocto.txt

kraj commented 12 months ago

yeah it means there is performance difference when we use thumb2 vs thumb1 ISA here, so perhaps need some more digging as to why