clang-omp / clang

clang with OpenMP 3.1 and some elements of OpenMP 4.0 support

Other

91 stars 15 forks source link

#pragma omp parallel for collapse(2) for (i = 0; i < N; ++i) { for (j = 0; j < N; ++j) { int Sum = 0; #pragma omp parallel for for (k = 0; k < N; ++k) { Sum += A[i][k] * B[k][j]; } C1[i][j] = Sum; } }

Mode	Time (s)
Serial	18.70
Parallel (with pragma)	1.45
Parallel (with private)	1.20

Mode

Time (s)

Serial

18.70

Parallel (with pragma)

1.45

Parallel (with private)

1.20

Hi, thanks for the report. The example is updated.

Best regards,

Alexey Bataev

Software Engineer Intel Compiler Team Intel Corp.

23.11.2014 18:39, jepio пишет:

After taking a look at the Matrix Multiplication gist linked from the github pages (link https://gist.github.com/clang-omp/89efe59c6f79ae2bc204) I was puzzled for a moment when I came to the parallel implementation:

pragma omp parallel for collapse(2)

for (i =0; i < N; ++i) { for (j =0; j < N; ++j) { int Sum =0;

pragma omp parallel for
     for  (k =0; k < N; ++k) {
         Sum += A[i][k] \* B[k][j];
     }
     C1[i][j] = Sum;
 }
}

I asked myself: what is a |#pragma omp parallel for| doing inside another parallel for ? After a while I understood that the point is to tell the omp parser that this is a for loop (privatize loop variable), but a side effect is that each time a thread arrives at that line it tries to start a new team of threads (which doesn't work and only introduces overhead). Since nested parallelism is disabled by default and even if it weren't it would probably not be good for performance, I think this example should be changed:

remove the pragma at line 40

add |private(k)| to the pragma at line 36 <-- /this is effectively what the second pragma is currently doing/

The real consideration here is performance. I have tested the current version and the new one with clang-omp on a 16 core machine. For a matrix size of 1000x1000 I consistently see the following results:

Mode Time (s) Serial 18.70 Parallel (with pragma) 1.45 Parallel (with private) 1.20

A second reason for the change is simply to encourage good style.

— Reply to this email directly or view it on GitHub https://github.com/clang-omp/clang/issues/53.

clang-omp / clang

Improve(/correct) Matrix Multiplication example. #53

Alexey Bataev

pragma omp parallel for collapse(2)

pragma omp parallel for