The position of pragmas is misplaced in the generated c code. I have attached an example. Without this pragma kerncraft predicts loop increment wrongly for the particular kernel on Intel compilers, because the Intel compiler unrolls and jams the outer loop.
pragma_problem.tar.gz
This happens only in Benchmark mode, I ran the code like this
kerncraft -p Benchmark -m HaswellEX_E5-2695v3.yml -D N 2000000 -D s 4 irk_A_2_3loop.c
The position of pragmas is misplaced in the generated c code. I have attached an example. Without this pragma kerncraft predicts loop increment wrongly for the particular kernel on Intel compilers, because the Intel compiler unrolls and jams the outer loop. pragma_problem.tar.gz
This happens only in Benchmark mode, I ran the code like this kerncraft -p Benchmark -m HaswellEX_E5-2695v3.yml -D N 2000000 -D s 4 irk_A_2_3loop.c