Closed mihailpopov closed 9 years ago
Hi, thanks for the report. I'll fix it ASAP.
Best regards,
Software Engineer Intel Compiler Team Intel Corp.
21.11.2014 17:28, Mihail Popov пишет:
Hi,
OpenMP 4.0 specification explains in chapter 2.7.1, Loop Construct, page 54, that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. Also, chapter 2.14.3.6 Reduction clause, page 170, line 7, specifies that if nowait is not used with a reduction clause, the reduction computation will be complete at the end of the construct.
I tested the source code below:
include
include
int main() { int sum =0;
#pragma omp parallel { int i; #pragma omp for reduction(+:sum) for (i=1;i<5;i++) { sum += i; } printf("before barrier =%d\n",sum); #pragma omp barrier printf("after barrier =%d\n",sum); }
}
From my understanding of the specification, I expect to see no difference on the sum value before or after the barrier because there should be an implicit barrier at the end of the loop. However I observe that the sum has a different values before and after.
This issue was observed only with specific thread configurations: the problem appears only if the number of threads is between 2 and 4. We could observe this problem with LLVM 3.4 and 3.5.
Logs of our tests follow:
|Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge) 3.2.0-4-amd64
Using LLVM 3.5 clang version 3.5.0 (https://github.com/clang-omp/clang bf475046fda9b3d9984fabebd3bcbfcf54c2d907) (https://github.com/clang-omp/llvm e45b045553e027cbe400cbb8ac8c264abbbfaf83) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 OUTPUT: * 4 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} before barrier = 3 before barrier = 3 before barrier = 6 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 * 8 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7} before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 It works as expected Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 OUTPUT: * 4 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} before barrier = 4 before barrier = 10 before barrier = 5 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 * 8 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7} before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 It works as expected
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Nehalem) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang a65de2d19de69dbd544d748cf31372f1cdceac8b) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Genuine Intel(R) CPU @ 2.70GHz (Sandy Bridge) 3.14.0 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang b340ac07c932acc9d99ccc2b291561cc04b8554f) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz (Core2) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 |
— Reply to this email directly or view it on GitHub https://github.com/clang-omp/clang/issues/52.
Hi, I've committed a fix for a bug. Try your test with the updated code
Best regards,
Software Engineer Intel Compiler Team Intel Corp.
21.11.2014 17:28, Mihail Popov пишет:
Hi,
OpenMP 4.0 specification explains in chapter 2.7.1, Loop Construct, page 54, that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. Also, chapter 2.14.3.6 Reduction clause, page 170, line 7, specifies that if nowait is not used with a reduction clause, the reduction computation will be complete at the end of the construct.
I tested the source code below:
include
include
int main() { int sum =0;
#pragma omp parallel { int i; #pragma omp for reduction(+:sum) for (i=1;i<5;i++) { sum += i; } printf("before barrier =%d\n",sum); #pragma omp barrier printf("after barrier =%d\n",sum); }
}
From my understanding of the specification, I expect to see no difference on the sum value before or after the barrier because there should be an implicit barrier at the end of the loop. However I observe that the sum has a different values before and after.
This issue was observed only with specific thread configurations: the problem appears only if the number of threads is between 2 and 4. We could observe this problem with LLVM 3.4 and 3.5.
Logs of our tests follow:
|Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge) 3.2.0-4-amd64
Using LLVM 3.5 clang version 3.5.0 (https://github.com/clang-omp/clang bf475046fda9b3d9984fabebd3bcbfcf54c2d907) (https://github.com/clang-omp/llvm e45b045553e027cbe400cbb8ac8c264abbbfaf83) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 OUTPUT: * 4 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} before barrier = 3 before barrier = 3 before barrier = 6 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 * 8 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7} before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 It works as expected Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 OUTPUT: * 4 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} before barrier = 4 before barrier = 10 before barrier = 5 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 * 8 threads OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7} OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4} OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5} OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6} OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7} before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 before barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 after barrier = 10 It works as expected
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Nehalem) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang a65de2d19de69dbd544d748cf31372f1cdceac8b) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Genuine Intel(R) CPU @ 2.70GHz (Sandy Bridge) 3.14.0 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang b340ac07c932acc9d99ccc2b291561cc04b8554f) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz (Core2) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 |
— Reply to this email directly or view it on GitHub https://github.com/clang-omp/clang/issues/52.
Hi, I tested the code with the last clang version, it works!
Hi,
OpenMP 4.0 specification explains in chapter 2.7.1, Loop Construct, page 54, that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. Also, chapter 2.14.3.6 Reduction clause, page 170, line 7, specifies that if nowait is not used with a reduction clause, the reduction computation will be complete at the end of the construct.
I tested the source code below:
From my understanding of the specification, I expect to see no difference on the sum value before or after the barrier because there should be an implicit barrier at the end of the loop. However I observe that the sum has a different values before and after.
This issue was observed only with specific thread configurations: the problem appears only if the number of threads is between 2 and 4. We could observe this problem with LLVM 3.4 and 3.5.
Logs of our tests follow: