embench / embench-iot

The main Embench repository
https://www.embench.org/
GNU General Public License v3.0
248 stars 101 forks source link

In the st benchmark some toolchains hoisting/removing the computation from the benchmarking loop #161

Open MarkHillHuawei opened 2 years ago

MarkHillHuawei commented 2 years ago

Some of the result variables in embench-iot/st are auto-declared (MeanA, MeanB, VarA, VarB,StddevA, StddevB) and at risk of the computation being removed via DCA/loop hoisting. For example, the inner loop include 3 fp divides per iteration but when armclang/6.14.1 compiles at O3 only one fp divide is performed in the whole benchmark run.

Re-declaring these as globals and calling the main loop body by function pointer seems to prevent this over-optimisation of the benchmark.

Roger-Shepherd commented 2 years ago

I understand why Re-declaring these as globals and calling the main loop body by function pointer might prevent the over optimisation, but what other effects do the changes have? I'd expect performance to go down and code size to go up. Have you measured this?

MarkHillHuawei commented 2 years ago

It depends on optimisation level and whether hardware FP is used or not. For hardware FP at O3 the performance goes down 20-30% and code size goes up 20-30% because the whole workload is being executed.

Roger-Shepherd commented 2 years ago

Sorry, wasn't clear, I meant - apart from the removal of the "over optimisation" what was the input.

MarkHillHuawei commented 2 years ago

The input was unchanged, it's an array of numbers which is initialised pseudo-randomly (using the same seed) on each iteration of the benchmark. In theory with aggressive enough constant propagation the compiler could compute the array contents and therefore the resulting stats calculations, but I am not seeing that!