Closed hfinkel closed 10 years ago
Hal, this is common problem in code generated by clang. See the IR generated by clang without OpenMP:
for.body: ; preds = %for.cond
%1 = load double* %scalar.addr, align 8
%2 = load i64* %j, align 8
%arrayidx = getelementptr inbounds [10000000 x double]* @c, i32 0, i64 %2
%3 = load double* %arrayidx, align 8
%mul = fmul double %1, %3
%4 = load i64* %j, align 8
%arrayidx1 = getelementptr inbounds [10000000 x double]* @b, i32 0, i64 %4
store double %mul, double* %arrayidx1, align 8
br label %for.inc
See the first line after for.body
label, there is exactly the same code.
We need an additional optimization pass which will hoist invariants out of loop body. This should be done in backend, not in frontend.
Okay; I think that I see what you mean. Generally, Clang will generate a local alloca to hold a local variable, and load/store to that local stack space on sequence-point boundaries. When you generate the outlined OpenMP regions, you simply transport the pointer to the original alloca through the dispatch interface (thus, "capturing" the variable). That being the case, I'm afraid that I agree with your analysis ;)
Compiling this with clang-omp:
void tuned_STREAM_Scale(STREAM_TYPE scalar) { ssize_t j;
pragma omp parallel for
}
results in IR that looks like this for the main loop:
omp.lb_ub.check_pass: ; preds = %omp.lb.le.global_ub. %17 = load double* %ref3, align 8, !tbaa !6 %18 = load i64* %j.private., align 8, !tbaa !8 %arrayidx = getelementptr inbounds [10000000 x double]* @c, i32 0, i64 %18 %19 = load double* %arrayidx, align 8, !tbaa !6 %mul5 = fmul double %17, %19 %20 = load i64* %j.private., align 8, !tbaa !8 %arrayidx6 = getelementptr inbounds [10000000 x double]* @b, i32 0, i64 %20 store double %mul5, double* %arrayidx6, align 8, !tbaa !6 br label %omp.cont.block
Please note that the captured parameter load that corresponds to 'scalar' in the original source:
%17 = load double* %ref3, align 8, !tbaa !6
is loaded in each loop iteration. Just as with other loads that needed hoisting in issue #27 , this load also needs to be hoisted.