Closed nomaddo closed 6 years ago
For below kernel with n = 12000
, there is no performance difference between --use-opt
and --no-opt
...
kernel void hello(global float * x, global float * y, int n){
int id = get_global_id(0);
int all = get_global_size(0);
int offset = id * n / all;
for (int i = 0; i < n / all; i++)
x[i + offset] += y[i + offset] * 2;
}
I see the intermediate languages. Auto vectorization works....
Maybe, optimization by opt
is not effective like aggressive loop unrolling due to the short of instruction cache
@doe300 Does VC4CL have a way to pass VC4C's compilation options? I want it to test this patch.
In Program.cpp it passes the options from the OpenCL calls to VC4C in precompile_program and link_programs to the compilation steps
Can you fix the code style issues? Otherwise it looks good to merge.
Before my commits, I did make clang-format
but diffs are appeared.....
I think the config of clang-format
is not complete: it is written in src/CMakeLists.txt
and the target is only ${VC4CC_SRCS}
and ${VC4C_SRCS}
. Din't work for files in include
.
This pullreq add auto vectorization by
opt
using option-force-vector-width=16
.After applying this patch, vectorization works.
VC4C
itself is successful in any-place