Benchmark speed reference values and division with CPU_MHZ

TilakGirijeswar commented 4 years ago

For computing benchmark speed, document mentions: "For each benchmark divide the time taken by the value used for CPU_MHZ in the configuration to give a normalized time value."

For the reference values computed, It is mentioned that default 16MHz is used for STM32, but config/arm/boards/stm32f4-discovery/boardsupport.c (or any other script) does not indicate on whether the benchmark time value is divided by CPU_MHZ.

I am running benchmark for speed on Microchip SAM Cortex M0+ device at 48Mhz and based on speed values which I have obtained, I don't think the reference values on STM32 used has CPU_MHZ divided to it.

Can you confirm please confirm the value of CPU_MHZ used for reference and whether the reference values are divided by CPU_MHZ?

If the reference values are not divided by CPU_MHZ, it would cause confusion when computing a relative benchmark score running at different frequency.

Can the process and values for reference values for speed be documented for future use?

jeremybennett commented 4 years ago

@TilakGirijeswar

Thanks for reporting this. Clearly a deficiency in the documentation. The benchmark_speed.py script reports the relative score without dividing by CPU_MHZ. You should do this yourself to get a per MHz score. Both are useful. One is a measure of the raw performance of a processor, the other the efficiency of the implementation. You are correct that the baseline processor (Arm Cortex M4) was run at 16MHz.

I've assigned to @PaoloS02 to investigate and make appropriate changes.

TilakGirijeswar commented 4 years ago

@jeremybennett ,

Thanks for looking into this and making a note on improving the document.

Meanwhile, My concern was also on the process.

I agree that raw performance is helpful, but this method will impact relative score. When I am running an ARM core at 48MHz (CPU_MHZ=48) more iterations are executed leading to more time consumption. There is no common base for comparison when using relative speed data as reference which uses 16MHz and obtained results are at 48MHz

For example,
ATSAML21 (Cortex M0+) running at 48 MHz takes 10027ms for aha-mont64 program. On calculating relative speed:

Reference time by STM32 on aha-mont64 @ 16MHz / Obtained time by SAM on aha-mont64 @ 48MHz 4004/10027 0.399

You can see that this score is too low.

But if you normalize frequency(by including divide by CPU_MHZ in script), change in calculation will be:

(Reference time by STM32 on aha-mont64/16) / (Obtained time by SAM on aha-mont64/48) (4004/16)/(10027/48) 250.25‬/208.89 1.197

This comparison seems to be reasonable.

jeremybennett commented 4 years ago

Hi @TilakGirijeswar

The benchmarks are scaled according to the speed of the machine. The intention is that each should run for about 4 seconds - a period long enough to be measured accurately, while meaning all the tests can be run in a few minutes.

This is necessary, because there is at least 3 orders of magnitude in variation in speed between the smallest and largest microcontrollers (think entry level Atmel AVR ATTiny and top end Cortex M4/M7).

So we end up with a lot of multiplying and dividing by CPU_MHZ. Your ARM core should execute 3 times more of each benchmark compared to the baseline, but that is then factored out, so your Embench score should be 3x higher. When we divide by MHz, you should be almost identical.

I think this serves to underline why we need to improve the documentation!

embench / embench-iot

Benchmark speed reference values and division with CPU_MHZ #59