ParRes / Kernels

This is a set of simple programs that can be used to explore the features of a parallel platform.
https://groups.google.com/forum/#!forum/parallel-research-kernels
Other
404 stars 106 forks source link

OpenMP Sparse access outside array boundaries #405

Open Jim-Walk opened 5 years ago

Jim-Walk commented 5 years ago

What type of issue is this?

If this is a bug report, please use the following template. Otherwise, please delete the rest of the template.

Where does this bug appear?

Check all that apply:

Operating system

What is the output of uname -a? Linux l0 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Compiler

gcc What is the output of ${COMPILER} -v or ${COMPILER} --version? gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

PRK build information

Please attach or inline make.defs.

#
# This file shows the GCC toolchain options for PRKs using
# OpenMP, MPI and/or Fortran coarrays only.
#
# Base compilers and language options
#
VERSION=-7
# C99 is required in some implementations.
CC=gcc${VERSION} -std=c11 -pthread
#EXTRA_CLIBS=-lrt
# All of the Fortran code is written for the 2008 standard and requires preprocessing.
FC=gfortran${VERSION} -std=f2008 -cpp
# C++11 may not be required but does no harm here.
CXX=g++${VERSION} -std=gnu++17 -pthread
#
# Compiler flags
#
# -mtune=native is appropriate for most cases.
# -march=native is appropriate if you want portable binaries.
DEFAULT_OPT_FLAGS=-O3 -mtune=native -ffast-math
#DEFAULT_OPT_FLAGS=-O0
DEFAULT_OPT_FLAGS+=-g3
#DEFAULT_OPT_FLAGS+=-fsanitize=undefined
#DEFAULT_OPT_FLAGS+=-fsanitize=undefined,leak
#DEFAULT_OPT_FLAGS+=-fsanitize=address
#DEFAULT_OPT_FLAGS+=-fsanitize=thread
# If you are compiling for KNL on a Xeon login node, use the following:
# DEFAULT_OPT_FLAGS=-g -O3 -march=knl
# See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html for details.
#
#DEFAULT_OPT_FLAGS+=-fopt-info-vec-missed
DEFAULT_OPT_FLAGS+=-Wall #-Werror
DEFAULT_OPT_FLAGS+=-Wno-ignored-attributes -Wno-deprecated-declarations
#DEFAULT_OPT_FLAGS+=-mavx -mfma
#
# OpenMP flags
#
OPENMPFLAG=-fopenmp
OPENMPSIMDFLAG=-fopenmp-simd
OFFLOADFLAG=-foffload="-O3 -v"
ORNLACCFLAG=-fopenacc
#
# OpenCL flags
#
# MacOS
OPENCLFLAG=-framework OpenCL
# Linux
#OPENCLDIR=/etc/alternatives/opencl-intel-tools
#OPENCLFLAG=-I${OPENCLDIR} -L${OPENCLDIR}/lib64 -lOpenCL
OPENCLFLAG+=-Wno-ignored-attributes -Wno-deprecated-declarations
METALFLAG=-framework MetalPerformanceShaders
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I$(SYCLDIR)/include
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-DUSE_SYCL -I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
METALFLAG=-framework MetalPerformanceShaders
#
# OCCA
#
#OCCADIR=${HOME}/prk-repo/Cxx11/occa
#
# Cilk
#
#CILKFLAG=-fcilkplus
#
# TBB
#
TBBDIR=/usr/local/Cellar/tbb/2019_U5_1
TBBFLAG=-I${TBBDIR}/include -L${TBBDIR}/lib -ltbb
#
# Parallel STL, Boost, etc.
#
BOOSTFLAG=-I/usr/local/Cellar/boost/1.69.0_2/include
RANGEFLAG=-DUSE_BOOST_IRANGE ${BOOSTFLAG}
#RANGEFLAG=-DUSE_RANGES_TS -I./range-v3/include
PSTLFLAG=${OPENMPSIMDFLAG} ${TBBFLAG} ${RANGEFLAG}
#PSTLFLAG=${OPENMPSIMDFLAG} ${TBBFLAG} -DUSE_INTEL_PSTL -I./pstl/include ${RANGEFLAG}
KOKKOSDIR=/opt/kokkos/gcc
KOKKOSFLAG=-I${KOKKOSDIR}/include -L${KOKKOSDIR}/lib -lkokkos ${OPENMPFLAG}
RAJADIR=/opt/raja/gcc
RAJAFLAG=-I${RAJADIR}/include -L${RAJADIR}/lib -lRAJA ${OPENMPFLAG} ${TBBFLAG}
THRUSTDIR=/Users/jrhammon/Work/NVIDIA/thrust
THRUSTFLAG=-I${THRUSTDIR} ${RANGEFLAG}
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -O3 -Wall -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I${SYCLDIR}/include ${BOOSTFLAG} -DTRISYCL
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
SYCLFLAG+=${RANGEFLAG}
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I${SYCLDIR}/include ${BOOSTFLAG}
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-DUSE_SYCL -I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
#
# CBLAS for C++ DGEMM
#
BLASFLAG=-DACCELERATE -framework Accelerate
CBLASFLAG=-DACCELERATE -framework Accelerate -flax-vector-conversions
#
# CUDA flags
#
# Mac w/ CUDA emulation via https://github.com/hughperkins/coriander
#NVCC=/opt/llvm/cocl/bin/cocl
# Linux w/ NVIDIA CUDA
NVCC=nvcc
CUDAFLAGS=-g -O3 -std=c++11 -arch=sm_50
# https://github.com/tensorflow/tensorflow/issues/1066#issuecomment-200574233
CUDAFLAGS+=-D_MWAITXINTRIN_H_INCLUDED
#
# ISPC
#
ISPC=ispc
ISPCFLAG=-O3 --target=host --opt=fast-math
#
# MPI
#
# We assume you have installed an implementation of MPI-3 that is in your path.
MPICC=mpicc -std=c99
#
# Fortran 2008 coarrays
#
# see https://github.com/ParRes/Kernels/blob/master/FORTRAN/README.md for details
# single-node
COARRAYFLAG=-fcoarray=single -lcaf_single
# multi-node
# COARRAYFLAG=-fcoarray=lib -lcaf_mpi

MEMKINDDIR=/home/parallels/PRK/deps
MEMKINDFLAGS=-I${MEMKINDDIR}/include -L${MEMKINDDIR}/lib -lmemkind -Wl,-rpath=${MEMKINDDIR}/lib

Output showing problem

When the sparse OpenMP benchmark is run with ./sparse 12 2 11 10 the program tries to write data to memory which has not been allocated. To find this error, please comment out line 262 and 263 of sparse.c, and then outside the for loop, on line 265 add printf("nent: %llu elm: %llu \n", nent, elm+4);. This will show that the length of col_index, nent, is 171966464, and the program tries to write to 171966467. Due to the parallel nature of this program, there is a strong chance that each thread is writing outside of it the boundaries of its own array. Furthermore, if the program is compiled with icc --check-pointers=rw on linux, the solution fails to validate.

If the output is short, please inline it here. Otherwise, please pipe it to a plain text file and attach that file. Note that you may need to use $command 2>&1 $log to capture the error messages.

Please do not attach screenshots of your terminal.

jeffhammond commented 5 years ago

Thanks for the bug report. I will try to figure this out.

AtlantaPepsi commented 3 years ago

I believe the boundary indices are valid. 171966467 printed is elm+4 (elm=171966463), in the case of not commenting out 262 and 263, elm will be reassigned at line 262 and end up being 171966464 after the loop exits. Since the print statement at 265 is out side the matrix loop (249-264), elm will never be used again, so there should be no out of bound errors. (If you run it with icc where the solution validates, you will get the same output.)

But validation does fail when compiled with gcc, and I think the source of the error is at line 285. temp should be declared private as well to avoid race condition. I still need to run more cases and check if there are other problems, but I believe this is the one.

For some reason icc is immune to this bug, I couldn't replicate this issue even with -check-pointers=rw, quite curious.