Open MarkHillHuawei opened 2 years ago
I understand why Re-declaring these as globals and calling the main loop body by function pointer might prevent the over optimisation, but what other effects do the changes have? I'd expect performance to go down and code size to go up. Have you measured this?
It depends on optimisation level and whether hardware FP is used or not. For hardware FP at O3 the performance goes down 20-30% and code size goes up 20-30% because the whole workload is being executed.
Sorry, wasn't clear, I meant - apart from the removal of the "over optimisation" what was the input.
The input was unchanged, it's an array of numbers which is initialised pseudo-randomly (using the same seed) on each iteration of the benchmark. In theory with aggressive enough constant propagation the compiler could compute the array contents and therefore the resulting stats calculations, but I am not seeing that!
Some of the result variables in embench-iot/st are auto-declared (MeanA, MeanB, VarA, VarB,StddevA, StddevB) and at risk of the computation being removed via DCA/loop hoisting. For example, the inner loop include 3 fp divides per iteration but when armclang/6.14.1 compiles at O3 only one fp divide is performed in the whole benchmark run.
Re-declaring these as globals and calling the main loop body by function pointer seems to prevent this over-optimisation of the benchmark.