Open hfp opened 5 years ago
This is a bug in CP2K :( Could you tell me where do you get it? In principle, in the popt you should not have any mention of openmp in the arch file...
Yes, the current model has a static and not centralized way to handle threads, which is inefficient. So, we have to adapt CP2K to this model. Let's leave this issue open for any feature activity about threading.
I ran the following exe/Linux-x86-64-intelx/cp2k.popt tests/QS/benchmark_single_node/dbcsr.inp
(any workload should fail similarly) and built with make ARCH=Linux-x86-64-intelx VERSION=popt GNU=1 LIBINTROOT=$HOME/libint/gnu-skx LIBXCROOT=$HOME/libxc/gnu-skx ELPAROOT=$HOME/elpa/gnu-skx-omp DBG=1 -j48
(CP2K/master). There was no -fopenmp
flag present for any of the Fortran translation units and DBCSR was built just as part of CP2K (I double-checked with tee
-ing the output and grep
-ing for OpenMP). So for that matter any POPT-Archfile should be fine (you do not have to worry about this "Linux-x86-64-intelx" ;-).
Well, we don't see such a problem on the dashboard...
So, I see that you are using
VERSION=psmp
but I assume it is a wrong copy&paste. Somehow you are enabling OpenMP for DBCSR... Could you confirm that during the compilation you don't see any -fopenmp?
Edited - indeed copy/paste issue (PSMP->POPT). I will find out why I am getting this while the Dashboard configs seem to be clean.
Maybe I understand this incorrectly, but looking at https://github.com/cp2k/dbcsr/blob/develop/src/dist/dbcsr_dist_methods.F#L45 seems to pull-in routines from the OpenMP runtime and later calling e.g., OMP_GET_MAX_THREADS()
. Should this be guarded by #if defined(_OPENMP)
?
Well no, in Fortran is enough to have
!$
at the beginning of the line...
Would it be that you are linking to OpenMP libraries, even if the flag -fopenmp is off?
This also works if OpenMP is linked but not enabled at compile-time?
Would it be that you are linking to OpenMP libraries, even if the flag -fopenmp is off?
Yes, but that's "normal" I think. Though, OpenBLAS or MKL could pull-in OpenMP runtime as well.
So the assumption here is that we get OpenMP in DBCSR even if a compile-time we don't ask for it just because we are linking OpenMP libraries... Could you try to avoid OpenMP everywhere (i.e. including linking) and see if this is the culprit? From what I can say, POPT tests do not use OpenMP at all everywhere.
POPT tests do not use OpenMP at all everywhere.
Yes, that's the reason for Dashboard being clean. My POPT indirectly links against the OpenMP-runtime. I will double-check that this is the root cause and report back.
Assuming that you find the reason in the linking, then we can proceed to use _OPENMP macro and see if it works
I am now getting linker errors about missing OpenMP symbols. I will resolve this over the course of today and build a non-OpenMP POPT. Stay tuned.
Let me collect my findings. The problem already seems to be present at CP2K level (GLOBAL| Number of threads for this process 48
) likely because CP2K uses OpenMP runtime in a similar fashion:
[hpabst@clx4 cp2k]$ exe/Linux-x86-64-intelx/cp2k.popt tests/QS/benchmark_single_node/dbcsr.inp
**** **** ****** ** PROGRAM STARTED AT 2019-08-07 12:18:58.044
***** ** *** *** ** PROGRAM STARTED ON clx4
** **** ****** PROGRAM STARTED BY hpabst
***** ** ** ** ** PROGRAM PROCESS ID 307757
**** ** ******* ** PROGRAM STARTED IN /nfs/home2/hpabst/cp2k
CP2K| version string: CP2K version 7.0 (Development Version)
CP2K| source code revision number: git:e4cf6be
CP2K| cp2kflags: omp fftw3 parallel mpi3 scalapack xsmm has_no_mpi_mod mkl
CP2K| is freely available from https://www.cp2k.org/
CP2K| Program compiled at Wed 7 Aug 12:16:06 BST 2019
CP2K| Program compiled on clx4
CP2K| Program compiled for Linux-x86-64-intelx
CP2K| Data directory path /nfs/home2/hpabst/cp2k/data
CP2K| Input file name tests/QS/benchmark_single_node/dbcsr.inp
GLOBAL| Method name TEST
GLOBAL| Project name PROJECT
GLOBAL| Preferred FFT library FFTW3
GLOBAL| Preferred diagonalization lib. SL
GLOBAL| Run type NONE
GLOBAL| All-to-all communication in single precision F
GLOBAL| FFTs using library dependent lengths F
GLOBAL| Global print level MEDIUM
GLOBAL| MPI I/O enabled T
GLOBAL| Total number of message passing processes 1
GLOBAL| Number of threads for this process 48
GLOBAL| This output is from process 0
GLOBAL| CPU model name Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz
GLOBAL| CPUID 1003
Well, not a surprise since DBCSR was part of CP2K ;) We can add a check for that and stop if this is the case, but I wonder if there is a problem during the preprocessing then... I mean, OpenMP should be off because we are using
!$
macro during the compiling time, so I don't see why it can be used at runtime. @dev-zero Do you have any idea for that?
Ok, I found the root cause: I am using -fopenmp-simd
which does not enable OpenMP support for multithreaded applications. This was decided when OpenMP introduced SIMD support, and the objective was to allow indepenent opt-in for both facilities. For instance, -fopenmp-simd
does not implicitly define _OPENMP
. However, at least GNU Fortran enables lines like !$ USE OMP_LIB, ONLY: omp_get_max_threads
, which is likely unintended.
Here is some playground (perhaps called omprt.F90
):
PROGRAM omprt
!$ USE OMP_LIB, ONLY: omp_get_max_threads
IMPLICIT NONE
INTEGER :: max_nthreads
max_nthreads = 1
!$ max_nthreads = omp_get_max_threads()
WRITE(*,*) "max_nthreads:", max_nthreads
#if defined(_OPENMP)
WRITE(*,*) "_OPENMP: defined"
#else
WRITE(*,*) "_OPENMP: not defined"
#endif
END PROGRAM
Some output:
gfortran omprt.F90
./a.out
max_nthreads: 1
_OPENMP: not defined
Some more:
gfortran -fopenmp-simd omprt.F90
/tmp/ccLNztfr.o: In function `MAIN__':
omprt.F90:(.text+0x13): undefined reference to `omp_get_max_threads_'
collect2: error: ld returned 1 exit status
The latter behavior is a bug, especially since ifort
works correctly. Intel Fortran needs -qopenmp-simd
(icc
supports both -fopenmp-simd
and -qopenmp-simd
).
I understand that -fopenmp-simd
has no adoption (yet) in CP2K/DBCSR and it is implicitly enabled by -fopenmp
(as opposed to the other way around). Anyhow, I think with this it would be almost overly cautious to protect calling omp_*
with #if defined(_OPENMP)
. However, there are likely not too many lines and someone may decide about it...
No really, we use !$ everywhere in the code! Is this with a recent GCC compiler?
I used GNU Fortran 8.3 i.e., the latest valid version for me (side note: 9.1 fails tons of CP2K's regtests).
The man page says:
-fopenmp-simd
Enable handling of OpenMP's SIMD directives with "#pragma omp" in C/C++ and "!$omp" in Fortran. Other OpenMP directives are ignored.
so it is clearly a bug in GCC during the preprocessing... Anyhow, I think the problem is understood, and it is no related to DBCSR (it is CP2K), but let's leave the issue open for future threading activity in DBCSR...
Thanks! The fix for me is also easy, I will just pull-off the -fopenmp-simd
in case of POPT (or similar). This way, I can proceed building POPT with some libraries being MT. In fact, I am planning to enable a race-free (synchronized) batch-processing of SMMs using libxsmmext in particular for POPT.
When building and running CP2K as POPT variant (any workload), DBCSR complains:
Apparently DBCSR is built with OpenMP even in case of CP2K/POPT. In
dbcsr_dist_methods.F:204
,dist%d%num_threads
is set according to the number of OpenMP-threads. The suggestion is not to build DBCSR without OpenMP, but to revise the check about whether the "Thread number has changed" in a broken way. It should be possible to build and run DBCSR with and without OpenMP independent of CP2K's OpenMP-flag. I suggest to only warn and terminate if the thread-count transitions from N to M with N.GT.1 and N.NE.M.