clang-omp / clang

clang with OpenMP 3.1 and some elements of OpenMP 4.0 support
clang-omp.github.com
Other
91 stars 15 forks source link

No implicit barrier at the end of "#pragma omp for" when a reduction clause is used #52

Closed mihailpopov closed 9 years ago

mihailpopov commented 9 years ago

Hi,

OpenMP 4.0 specification explains in chapter 2.7.1, Loop Construct, page 54, that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. Also, chapter 2.14.3.6 Reduction clause, page 170, line 7, specifies that if nowait is not used with a reduction clause, the reduction computation will be complete at the end of the construct.

I tested the source code below:

#include <omp.h>
#include <stdio.h>

int main() 
{
    int sum = 0;

    #pragma omp parallel 
    {   
        int i;
        #pragma omp for reduction(+:sum)
        for (i=1;i<5;i++)
        {
            sum += i;
        } 

        printf("before barrier = %d\n",sum);   
        #pragma omp barrier 
        printf("after barrier = %d\n",sum);    
    }
}

From my understanding of the specification, I expect to see no difference on the sum value before or after the barrier because there should be an implicit barrier at the end of the loop. However I observe that the sum has a different values before and after.

This issue was observed only with specific thread configurations: the problem appears only if the number of threads is between 2 and 4. We could observe this problem with LLVM 3.4 and 3.5.

Logs of our tests follow:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge)
3.2.0-4-amd64

    Using LLVM 3.5
    clang version 3.5.0 (https://github.com/clang-omp/clang bf475046fda9b3d9984fabebd3bcbfcf54c2d907) (https://github.com/clang-omp/llvm e45b045553e027cbe400cbb8ac8c264abbbfaf83)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

    OUTPUT:
        * 4 threads
            OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
            OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
            OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
            OMP: Info #156: KMP_AFFINITY: 8 available OS procs
            OMP: Info #157: KMP_AFFINITY: Uniform topology
            OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
            OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
            OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 
            OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 
            OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 
            OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
            OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
            OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
            OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
            OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
            OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
            before barrier = 3
            before barrier = 3
            before barrier = 6
            before barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10

        * 8 threads
            OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
            OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
            OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
            OMP: Info #156: KMP_AFFINITY: 8 available OS procs
            OMP: Info #157: KMP_AFFINITY: Uniform topology
            OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
            OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
            OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 
            OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 
            OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 
            OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
            OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
            OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
            OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
            OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
            OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
            OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
            OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4}
            OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
            OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6}
            OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
            before barrier = 10
            before barrier = 10
            before barrier = 10
            before barrier = 10
            before barrier = 10
            before barrier = 10
            before barrier = 10
            before barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            after barrier = 10
            It works as expected

    Using LLVM 3.4
    clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

    OUTPUT:
        * 4 threads
        OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
        OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
        OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
        OMP: Info #156: KMP_AFFINITY: 8 available OS procs
        OMP: Info #157: KMP_AFFINITY: Uniform topology
        OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
        OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
        OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 
        OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 
        OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 
        OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
        OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
        OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
        OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
        OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
        OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
        before barrier = 4
        before barrier = 10
        before barrier = 5
        before barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10

        * 8 threads
        OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
        OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
        OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
        OMP: Info #156: KMP_AFFINITY: 8 available OS procs
        OMP: Info #157: KMP_AFFINITY: Uniform topology
        OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
        OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
        OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 
        OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 
        OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 
        OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
        OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
        OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
        OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
        OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
        OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
        OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
        OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4}
        OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
        OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6}
        OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
        before barrier = 10
        before barrier = 10
        before barrier = 10
        before barrier = 10
        before barrier = 10
        before barrier = 10
        before barrier = 10
        before barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        after barrier = 10
        It works as expected

Intel(R) Xeon(R) CPU E5620  @ 2.40GHz (Nehalem)
3.2.0-4-amd64
    Using LLVM 3.4
    clang version 3.4 (https://github.com/clang-omp/clang a65de2d19de69dbd544d748cf31372f1cdceac8b) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

Genuine Intel(R) CPU @ 2.70GHz (Sandy Bridge)
3.14.0
    Using LLVM 3.4
    clang version 3.4 (https://github.com/clang-omp/clang b340ac07c932acc9d99ccc2b291561cc04b8554f) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz (Core2)
3.2.0-4-amd64
    Using LLVM 3.4
    clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
alexey-bataev commented 9 years ago

Hi, thanks for the report. I'll fix it ASAP.

Best regards,

Alexey Bataev

Software Engineer Intel Compiler Team Intel Corp.

21.11.2014 17:28, Mihail Popov пишет:

Hi,

OpenMP 4.0 specification explains in chapter 2.7.1, Loop Construct, page 54, that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. Also, chapter 2.14.3.6 Reduction clause, page 170, line 7, specifies that if nowait is not used with a reduction clause, the reduction computation will be complete at the end of the construct.

I tested the source code below:

include

include

int main() { int sum =0;

 #pragma  omp parallel
 {
     int  i;
     #pragma  omp for reduction(+:sum)
     for  (i=1;i<5;i++)
     {
         sum += i;
     }

     printf("before barrier =%d\n",sum);
     #pragma  omp barrier
     printf("after barrier =%d\n",sum);
 }

}

From my understanding of the specification, I expect to see no difference on the sum value before or after the barrier because there should be an implicit barrier at the end of the loop. However I observe that the sum has a different values before and after.

This issue was observed only with specific thread configurations: the problem appears only if the number of threads is between 2 and 4. We could observe this problem with LLVM 3.4 and 3.5.

Logs of our tests follow:

|Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge) 3.2.0-4-amd64

 Using LLVM 3.5
 clang version 3.5.0 (https://github.com/clang-omp/clang bf475046fda9b3d9984fabebd3bcbfcf54c2d907) (https://github.com/clang-omp/llvm e45b045553e027cbe400cbb8ac8c264abbbfaf83)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

 OUTPUT:
     * 4 threads
         OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
         OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
         OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
         OMP: Info #156: KMP_AFFINITY: 8 available OS procs
         OMP: Info #157: KMP_AFFINITY: Uniform topology
         OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
         OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
         OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
         OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
         OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
         OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
         OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
         OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
         before barrier = 3
         before barrier = 3
         before barrier = 6
         before barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10

     * 8 threads
         OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
         OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
         OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
         OMP: Info #156: KMP_AFFINITY: 8 available OS procs
         OMP: Info #157: KMP_AFFINITY: Uniform topology
         OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
         OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
         OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
         OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
         OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
         OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
         OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
         OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
         OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4}
         OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
         OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6}
         OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         It works as expected

 Using LLVM 3.4
 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

 OUTPUT:
     * 4 threads
     OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
     OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
     OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
     OMP: Info #156: KMP_AFFINITY: 8 available OS procs
     OMP: Info #157: KMP_AFFINITY: Uniform topology
     OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
     OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
     OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
     OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
     OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
     OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
     OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
     OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
     before barrier = 4
     before barrier = 10
     before barrier = 5
     before barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10

     * 8 threads
     OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
     OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
     OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
     OMP: Info #156: KMP_AFFINITY: 8 available OS procs
     OMP: Info #157: KMP_AFFINITY: Uniform topology
     OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
     OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
     OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
     OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
     OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
     OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
     OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
     OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
     OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4}
     OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
     OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6}
     OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     It works as expected

Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Nehalem) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang a65de2d19de69dbd544d748cf31372f1cdceac8b) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

Genuine Intel(R) CPU @ 2.70GHz (Sandy Bridge) 3.14.0 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang b340ac07c932acc9d99ccc2b291561cc04b8554f) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz (Core2) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 |

— Reply to this email directly or view it on GitHub https://github.com/clang-omp/clang/issues/52.

alexey-bataev commented 9 years ago

Hi, I've committed a fix for a bug. Try your test with the updated code

Best regards,

Alexey Bataev

Software Engineer Intel Compiler Team Intel Corp.

21.11.2014 17:28, Mihail Popov пишет:

Hi,

OpenMP 4.0 specification explains in chapter 2.7.1, Loop Construct, page 54, that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. Also, chapter 2.14.3.6 Reduction clause, page 170, line 7, specifies that if nowait is not used with a reduction clause, the reduction computation will be complete at the end of the construct.

I tested the source code below:

include

include

int main() { int sum =0;

 #pragma  omp parallel
 {
     int  i;
     #pragma  omp for reduction(+:sum)
     for  (i=1;i<5;i++)
     {
         sum += i;
     }

     printf("before barrier =%d\n",sum);
     #pragma  omp barrier
     printf("after barrier =%d\n",sum);
 }

}

From my understanding of the specification, I expect to see no difference on the sum value before or after the barrier because there should be an implicit barrier at the end of the loop. However I observe that the sum has a different values before and after.

This issue was observed only with specific thread configurations: the problem appears only if the number of threads is between 2 and 4. We could observe this problem with LLVM 3.4 and 3.5.

Logs of our tests follow:

|Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge) 3.2.0-4-amd64

 Using LLVM 3.5
 clang version 3.5.0 (https://github.com/clang-omp/clang bf475046fda9b3d9984fabebd3bcbfcf54c2d907) (https://github.com/clang-omp/llvm e45b045553e027cbe400cbb8ac8c264abbbfaf83)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

 OUTPUT:
     * 4 threads
         OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
         OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
         OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
         OMP: Info #156: KMP_AFFINITY: 8 available OS procs
         OMP: Info #157: KMP_AFFINITY: Uniform topology
         OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
         OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
         OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
         OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
         OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
         OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
         OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
         OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
         before barrier = 3
         before barrier = 3
         before barrier = 6
         before barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10

     * 8 threads
         OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
         OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
         OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
         OMP: Info #156: KMP_AFFINITY: 8 available OS procs
         OMP: Info #157: KMP_AFFINITY: Uniform topology
         OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
         OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
         OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
         OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
         OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
         OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
         OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
         OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
         OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
         OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
         OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4}
         OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
         OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6}
         OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         before barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         after barrier = 10
         It works as expected

 Using LLVM 3.4
 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

 OUTPUT:
     * 4 threads
     OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
     OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
     OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
     OMP: Info #156: KMP_AFFINITY: 8 available OS procs
     OMP: Info #157: KMP_AFFINITY: Uniform topology
     OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
     OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
     OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
     OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
     OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
     OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
     OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
     OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
     before barrier = 4
     before barrier = 10
     before barrier = 5
     before barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10

     * 8 threads
     OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
     OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
     OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
     OMP: Info #156: KMP_AFFINITY: 8 available OS procs
     OMP: Info #157: KMP_AFFINITY: Uniform topology
     OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
     OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
     OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1
     OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0
     OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
     OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
     OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,4}
     OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,5}
     OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,6}
     OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3,7}
     OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,4}
     OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {1,5}
     OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,6}
     OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {3,7}
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     before barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     after barrier = 10
     It works as expected

Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Nehalem) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang a65de2d19de69dbd544d748cf31372f1cdceac8b) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

Genuine Intel(R) CPU @ 2.70GHz (Sandy Bridge) 3.14.0 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang b340ac07c932acc9d99ccc2b291561cc04b8554f) (https://github.com/clang-omp/llvm 233b1e3f0347e8946e336812763849b896cc2300) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz (Core2) 3.2.0-4-amd64 Using LLVM 3.4 clang version 3.4 (https://github.com/clang-omp/clang bdc8bcdb94f2b954059d4ce1b5762f701ecc0809) (https://github.com/clang-omp/llvm 92414b1167a33c0ab9187c72b098a54ecbffc15c) Target: x86_64-unknown-linux-gnu Thread model: posix Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7 |

— Reply to this email directly or view it on GitHub https://github.com/clang-omp/clang/issues/52.

mihailpopov commented 9 years ago

Hi, I tested the code with the last clang version, it works!