[OpenMP] TeaLeaf_ref: local pointer variable not captured into parallel region

pawosm-arm commented 6 years ago

While running TeaLeaf_ref on larger number of threads unexpected behavior resulting in results validation errors were observed. It turned out (by analyzing LLVM-IR built with -O0), one of local variables (a pointer to a function) instead of being captured, became allocated on stack in the outlined function for the parallel region and then used uninitialized causing either segfault or wrong subroutine being called.

Outer region:

!$OMP PARALLEL PRIVATE(tile_offset)
!$OMP DO
  DO t=1,tiles_per_task
    SELECT CASE (face)
    CASE (CHUNK_LEFT, CHUNK_RIGHT)
      tile_offset = (chunk%tiles(t)%bottom - chunk%bottom)*depth
    CASE (CHUNK_BOTTOM, CHUNK_TOP)
      tile_offset = (chunk%tiles(t)%left - chunk%left)*depth
      IF (tile_offset .NE. 0) THEN
        tile_offset = tile_offset + depth*depth
      ENDIF
    CASE DEFAULT
      CALL report_error("pack.f90","Invalid face pased to buffer packing")
    END SELECT

    IF (chunk%tiles(t)%tile_neighbours(face) .NE. EXTERNAL_FACE) THEN
      CYCLE
    ENDIF

    CALL pack_all(chunk%tiles(t)%field%x_min,                    &
                  chunk%tiles(t)%field%x_max,                    &
                  chunk%tiles(t)%field%y_min,                    &
                  chunk%tiles(t)%field%y_max,                    &
                  chunk%halo_exchange_depth,                     &
                  chunk%tiles(t)%tile_neighbours,     &
                  chunk%tiles(t)%field%density,        &
                  chunk%tiles(t)%field%energy0,        &
                  chunk%tiles(t)%field%energy1,        &
                  chunk%tiles(t)%field%u,              &
                  chunk%tiles(t)%field%vector_p,       &
                  chunk%tiles(t)%field%vector_sd,      &
                  chunk%tiles(t)%field%vector_rtemp,      &
                  chunk%tiles(t)%field%vector_z,      &
                  chunk%tiles(t)%field%vector_kx,     &
                  chunk%tiles(t)%field%vector_ky,     &
                  chunk%tiles(t)%field%vector_di,     &
                  fields, &
                  depth, &
                  face, &
                  packing, &
                  mpi_buffer,                &
                  offsets, &
                  tile_offset)
  ENDDO
!$OMP END DO NOWAIT
!$OMP END PARALLEL

pack_all subroutine and its pack_func local variable (pack_kernel.f90 file):

SUBROUTINE pack_all(x_min, x_max, y_min, y_max, halo_exchange_depth, &
    tile_neighbours, &
    density,                                                    &
    energy0,                                                    &
    energy1,                                                    &
    u,                                                          &
    p,                                                          &
    sd,                                                         &
    r,                                                          &
    z,                                                          &
    kx,                                                         &
    ky,                                                         &
    di,                                                         &
    fields, depth, face, packing, mpi_buffer, offsets, tile_offset)
.
.
.
PROCEDURE(pack_or_unpack), POINTER :: pack_func => NULL()
.
.
.
  IF (packing .EQV. .TRUE.) THEN
    SELECT CASE (face)
    CASE (CHUNK_LEFT)
      pack_func => tea_pack_message_left
    CASE (CHUNK_RIGHT)
      pack_func => tea_pack_message_right
    CASE (CHUNK_BOTTOM)
      pack_func => tea_pack_message_bottom
    CASE (CHUNK_TOP)
      pack_func => tea_pack_message_top
    END SELECT
  ELSE
    SELECT CASE (face)
    CASE (CHUNK_LEFT)
      pack_func => tea_unpack_message_left
    CASE (CHUNK_RIGHT)
      pack_func => tea_unpack_message_right
    CASE (CHUNK_BOTTOM)
      pack_func => tea_unpack_message_bottom
    CASE (CHUNK_TOP)
      pack_func => tea_unpack_message_top
    END SELECT
  ENDIF
.
.
.

Inner parallel region further in pack_all subroutine:

.
.
.
!$OMP PARALLEL
  IF (fields(FIELD_DENSITY).EQ.1) THEN
      CALL pack_func(x_min,                    &
                     x_max,                    &
                     y_min,                    &
                     y_max,                    &
                     halo_exchange_depth,                    &
                     density,                 &
                     mpi_buffer,                &
                     depth, xincs(CELL_DATA), yincs(CELL_DATA),   &
                     tile_offset + offsets(FIELD_DENSITY),   &
                     edge_minus, edge_plus)
  ENDIF
  IF (fields(FIELD_ENERGY0).EQ.1) THEN
      CALL pack_func(x_min,                    &
                     x_max,                    &
                     y_min,                    &
                     y_max,                    &
                     halo_exchange_depth,                    &
                     energy0,                  &
                     mpi_buffer,                &
                     depth, xincs(CELL_DATA), yincs(CELL_DATA),   &
                     tile_offset + offsets(FIELD_ENERGY0),   &
                     edge_minus, edge_plus)
  ENDIF
.
.
.

The same compiled by gfortran does not fail, even if I link it against libomp.so instead of GOMP.

pawosm-arm commented 6 years ago

The nasty workaround is to move !$OMP PARALLEL before IF statement setting pack_func and to specify pack_func as private, !$OMP PARALLEL PRIVATE(pack_func)

pawosm-arm commented 6 years ago

The best fireworks this bug causes when one builds this code with -O0 (along with -fopenmp)

pawosm-arm commented 6 years ago

Seems similar to #345 which was already closed. I guess the fix does not cover all the possible cases.

pawosm-arm commented 6 years ago

Turns out, call ptr is not considered as using ptr variable, hence as unused, this variable is not captured. This leads to a simpler workaround: put pack_func = pack_func anywhere inside of the parallel region.

pawosm-arm commented 6 years ago

Simple test case:

! RUN: %flang -O0 -fopenmp -S -emit-llvm %s -o - 2>&1 | FileCheck %s

program subcallx

  implicit none

  call sub2

contains

subroutine sub1(x)

  implicit none

  integer :: x

  print *, "Hello from sub1", x

end subroutine sub1

! CHECK: define internal void @subcallx_sub2
subroutine sub2

  implicit none

  interface
    subroutine simple_sub(x)
      implicit none
      integer :: x
    end subroutine
  end interface

  procedure(simple_sub), pointer :: pptr => null() ! CHECK: %"pptr$p_{{[0-9]+}}" = alloca void

  pptr => sub1

! CHECK: @subcallx__{{.+}}_ to i64*
! CHECK: @__kmpc_fork_call
! CHECK: define internal void @subcallx__{{.+}}_
!$OMP PARALLEL
  call pptr(123) ! CHECK-NOT: %"pptr${{.+}}_{{[0-9]+}}" = alloca
!$OMP END PARALLEL

end subroutine sub2

end program subcallx

Simple fix:

--- a/tools/flang1/flang1exe/semant3.c
+++ b/tools/flang1/flang1exe/semant3.c
@@ -1364,8 +1364,10 @@ end_stmt:
       sptr = SST_SYMG(RHS(1));
       if (!is_procedure_ptr(sptr)) {
         subr_call(RHS(1), itemp);
-      } else
+      } else {
+        (void)mkarg(RHS(1), &dum);
         ptrsubr_call(RHS(1), itemp);
+      }
     } else {
       ptrsubr_call(RHS(1), itemp);
     }

Unfortunately, mkarg() couldn't handle correctly a case with a pointer to a procedure that does not take any arguments. Example test case that fails CHECK-NOT test even with above patch applied:

! RUN: %flang -O0 -fopenmp -S -emit-llvm %s -o - 2>&1 | FileCheck %s

program subcall

  implicit none

  call sub2

contains

subroutine sub1

  implicit none

  print *, "Hello from sub1"

end subroutine sub1

! CHECK: define internal void @subcall_sub2
subroutine sub2

  implicit none

  interface
    subroutine simple_sub
      implicit none
    end subroutine
  end interface

  procedure(simple_sub), pointer :: pptr => null() ! CHECK: %"pptr$p_{{[0-9]+}}" = alloca void

  pptr => sub1

! CHECK: @subcall__{{.+}}_ to i64*
! CHECK: @__kmpc_fork_call
! CHECK: define internal void @subcall__{{.+}}_
!$OMP PARALLEL
  call pptr ! CHECK-NOT: %"pptr${{.+}}_{{[0-9]+}}" = alloca
!$OMP END PARALLEL

end subroutine sub2

end program subcall

pawosm-arm commented 5 years ago

Simple test case 1 above still causes segfault when executed. Internally, we're patching semant3 as described above.

flang-compiler / flang

[OpenMP] TeaLeaf_ref: local pointer variable not captured into parallel region #461