ParRes / Kernels

This is a set of simple programs that can be used to explore the features of a parallel platform.
https://groups.google.com/forum/#!forum/parallel-research-kernels
Other
409 stars 107 forks source link

Data Race due to num_error variable #585

Open ghost opened 3 years ago

ghost commented 3 years ago

What type of issue is this?

If this is a bug report, please use the following template. Otherwise, please delete the rest of the template.

Where does this bug appear?

Check all that apply:

Operating system

What is the output of uname -a? Linux 299fdde96882 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Compiler

What is the output of ${COMPILER} -v or ${COMPILER} --version? clang version 10.0.1

PRK build information

Please attach or inline make.defs.

#name of MPI C compiler, e.g. mpiicc, mpicc
MPICC=

#name of C compiler, e.g. icc, xlc, gcc
CC=clang-10

#name of MPI Fortran compiler, e.g. mpifort, mpif90
MPIF90=

#name of Fortran compiler, e.g. ifort, xlf_r, gfortran
FC=

#name of compile line flag enabling OpenMP, e.g. -openmp, -qopenmp, -fopenmp
OPENMPFLAG=-fopenmp
OFFLOADFLAG=

#default compiler optimization flags
DEFAULT_OPT_FLAGS:=

Output showing problem

I detected a data race occurring in all of the OpenMP Kernels except for Refcount. All the Kernels have the same data race in involving the num_error variable, specifically when one thread will try to write num_error=1 while another will try to read bail_out(num_error). An example from branch:

 #pragma omp parallel private(i, my_ID, iter, aux, nfunc, rank) reduction(+:total)
  {
  int * RESTRICT vector; int * RESTRICT index;

  #pragma omp master
  {
  nthread = omp_get_num_threads();
  if (nthread != nthread_input) {
    num_error = 1;
    printf("ERROR: number of requested threads %d does not equal ",
           nthread_input);
    printf("number of spawned threads %d\n", nthread);
  }
  else {
    printf("Number of threads          = %d\n", nthread_input);
    printf("Vector length              = %d\n", vector_length);
    printf("Number of iterations       = %d\n", iterations);
    printf("Branching type             = %s\n", branch_type);
#if RESTRICT_KEYWORD
    printf("No aliasing                = on\n");
#else
    printf("No aliasing                = off\n");
#endif
  }
  }
  bail_out(num_error);

The data race occurs between lines 9 and 26 in this snippet, or lines 207 and 224 of branch.c. I found this data race using the Coderrect Scanner https://coderrect.com/

Please do not attach screenshots of your terminal.

jeffhammond commented 3 years ago

Okay, I'll try to fix soon but that might still be a while.

AtlantaPepsi commented 3 years ago

Technically this is a race condition as there could be write after read for num_error. But functionally it doesn't make a difference, since master thread will always catch error value after master construct and exit. Other thread will wait at barrier inside bait_out function immediately after the function call before all thread including master confirm valid inputs.

@jeffhammond what do you think? shall we close this?

tgmattso commented 3 years ago

Yes, I think we should close this.