cp2k / dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library
https://cp2k.github.io/dbcsr/
GNU General Public License v2.0
135 stars 46 forks source link

Thread number has changed #217

Open hfp opened 5 years ago

hfp commented 5 years ago

When building and running CP2K as POPT variant (any workload), DBCSR complains:

 *******************************************************************************
 *   ___                                                                       *
 *  /   \                                                                      *
 * [ABORT]                                                                     *
 *  \___/                        Thread number has changed                     *
 *    |                                                                        *
 *  O/|                                                                        *
 * /| |                                                                        *
 * / \                                                dbcsr_dist_methods.F:610 *
 *******************************************************************************

Apparently DBCSR is built with OpenMP even in case of CP2K/POPT. In dbcsr_dist_methods.F:204, dist%d%num_threads is set according to the number of OpenMP-threads. The suggestion is not to build DBCSR without OpenMP, but to revise the check about whether the "Thread number has changed" in a broken way. It should be possible to build and run DBCSR with and without OpenMP independent of CP2K's OpenMP-flag. I suggest to only warn and terminate if the thread-count transitions from N to M with N.GT.1 and N.NE.M.

alazzaro commented 5 years ago

This is a bug in CP2K :( Could you tell me where do you get it? In principle, in the popt you should not have any mention of openmp in the arch file...

Yes, the current model has a static and not centralized way to handle threads, which is inefficient. So, we have to adapt CP2K to this model. Let's leave this issue open for any feature activity about threading.

hfp commented 5 years ago

I ran the following exe/Linux-x86-64-intelx/cp2k.popt tests/QS/benchmark_single_node/dbcsr.inp (any workload should fail similarly) and built with make ARCH=Linux-x86-64-intelx VERSION=popt GNU=1 LIBINTROOT=$HOME/libint/gnu-skx LIBXCROOT=$HOME/libxc/gnu-skx ELPAROOT=$HOME/elpa/gnu-skx-omp DBG=1 -j48 (CP2K/master). There was no -fopenmp flag present for any of the Fortran translation units and DBCSR was built just as part of CP2K (I double-checked with tee-ing the output and grep-ing for OpenMP). So for that matter any POPT-Archfile should be fine (you do not have to worry about this "Linux-x86-64-intelx" ;-).

alazzaro commented 5 years ago

Well, we don't see such a problem on the dashboard... So, I see that you are using VERSION=psmp but I assume it is a wrong copy&paste. Somehow you are enabling OpenMP for DBCSR... Could you confirm that during the compilation you don't see any -fopenmp?

hfp commented 5 years ago

Edited - indeed copy/paste issue (PSMP->POPT). I will find out why I am getting this while the Dashboard configs seem to be clean.

hfp commented 5 years ago

Maybe I understand this incorrectly, but looking at https://github.com/cp2k/dbcsr/blob/develop/src/dist/dbcsr_dist_methods.F#L45 seems to pull-in routines from the OpenMP runtime and later calling e.g., OMP_GET_MAX_THREADS(). Should this be guarded by #if defined(_OPENMP)?

alazzaro commented 5 years ago

Well no, in Fortran is enough to have

!$

at the beginning of the line...

alazzaro commented 5 years ago

Would it be that you are linking to OpenMP libraries, even if the flag -fopenmp is off?

hfp commented 5 years ago

This also works if OpenMP is linked but not enabled at compile-time?

hfp commented 5 years ago

Would it be that you are linking to OpenMP libraries, even if the flag -fopenmp is off?

Yes, but that's "normal" I think. Though, OpenBLAS or MKL could pull-in OpenMP runtime as well.

alazzaro commented 5 years ago

So the assumption here is that we get OpenMP in DBCSR even if a compile-time we don't ask for it just because we are linking OpenMP libraries... Could you try to avoid OpenMP everywhere (i.e. including linking) and see if this is the culprit? From what I can say, POPT tests do not use OpenMP at all everywhere.

hfp commented 5 years ago

POPT tests do not use OpenMP at all everywhere.

Yes, that's the reason for Dashboard being clean. My POPT indirectly links against the OpenMP-runtime. I will double-check that this is the root cause and report back.

alazzaro commented 5 years ago

Assuming that you find the reason in the linking, then we can proceed to use _OPENMP macro and see if it works

hfp commented 5 years ago

I am now getting linker errors about missing OpenMP symbols. I will resolve this over the course of today and build a non-OpenMP POPT. Stay tuned.

hfp commented 5 years ago

Let me collect my findings. The problem already seems to be present at CP2K level (GLOBAL| Number of threads for this process 48) likely because CP2K uses OpenMP runtime in a similar fashion:

[hpabst@clx4 cp2k]$ exe/Linux-x86-64-intelx/cp2k.popt tests/QS/benchmark_single_node/dbcsr.inp

  **** **** ******  **  PROGRAM STARTED AT               2019-08-07 12:18:58.044
 ***** ** ***  *** **   PROGRAM STARTED ON                                  clx4
 **    ****   ******    PROGRAM STARTED BY                                hpabst
 ***** **    ** ** **   PROGRAM PROCESS ID                                307757
  **** **  *******  **  PROGRAM STARTED IN                /nfs/home2/hpabst/cp2k

 CP2K| version string:                    CP2K version 7.0 (Development Version)
 CP2K| source code revision number:                                  git:e4cf6be
 CP2K| cp2kflags: omp fftw3 parallel mpi3 scalapack xsmm has_no_mpi_mod mkl
 CP2K| is freely available from                            https://www.cp2k.org/
 CP2K| Program compiled at                           Wed 7 Aug 12:16:06 BST 2019
 CP2K| Program compiled on                                                  clx4
 CP2K| Program compiled for                                  Linux-x86-64-intelx
 CP2K| Data directory path                           /nfs/home2/hpabst/cp2k/data
 CP2K| Input file name                  tests/QS/benchmark_single_node/dbcsr.inp

 GLOBAL| Method name                                                        TEST
 GLOBAL| Project name                                                    PROJECT
 GLOBAL| Preferred FFT library                                             FFTW3
 GLOBAL| Preferred diagonalization lib.                                       SL
 GLOBAL| Run type                                                           NONE
 GLOBAL| All-to-all communication in single precision                          F
 GLOBAL| FFTs using library dependent lengths                                  F
 GLOBAL| Global print level                                               MEDIUM
 GLOBAL| MPI I/O enabled                                                       T
 GLOBAL| Total number of message passing processes                             1
 GLOBAL| Number of threads for this process                                   48
 GLOBAL| This output is from process                                           0
 GLOBAL| CPU model name            Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz
 GLOBAL| CPUID                                                              1003
alazzaro commented 5 years ago

Well, not a surprise since DBCSR was part of CP2K ;) We can add a check for that and stop if this is the case, but I wonder if there is a problem during the preprocessing then... I mean, OpenMP should be off because we are using

!$

macro during the compiling time, so I don't see why it can be used at runtime. @dev-zero Do you have any idea for that?

hfp commented 5 years ago

Ok, I found the root cause: I am using -fopenmp-simd which does not enable OpenMP support for multithreaded applications. This was decided when OpenMP introduced SIMD support, and the objective was to allow indepenent opt-in for both facilities. For instance, -fopenmp-simd does not implicitly define _OPENMP. However, at least GNU Fortran enables lines like !$ USE OMP_LIB, ONLY: omp_get_max_threads, which is likely unintended.

Here is some playground (perhaps called omprt.F90):

      PROGRAM omprt
!$      USE OMP_LIB, ONLY: omp_get_max_threads
        IMPLICIT NONE
        INTEGER :: max_nthreads

        max_nthreads = 1
!$      max_nthreads = omp_get_max_threads()
        WRITE(*,*) "max_nthreads:", max_nthreads
#if defined(_OPENMP)
        WRITE(*,*) "_OPENMP: defined"
#else
        WRITE(*,*) "_OPENMP: not defined"
#endif
      END PROGRAM

Some output:

gfortran omprt.F90
./a.out
 max_nthreads:           1
 _OPENMP: not defined

Some more:

gfortran -fopenmp-simd omprt.F90
/tmp/ccLNztfr.o: In function `MAIN__':
omprt.F90:(.text+0x13): undefined reference to `omp_get_max_threads_'
collect2: error: ld returned 1 exit status

The latter behavior is a bug, especially since ifort works correctly. Intel Fortran needs -qopenmp-simd (icc supports both -fopenmp-simd and -qopenmp-simd).

hfp commented 5 years ago

I understand that -fopenmp-simd has no adoption (yet) in CP2K/DBCSR and it is implicitly enabled by -fopenmp (as opposed to the other way around). Anyhow, I think with this it would be almost overly cautious to protect calling omp_* with #if defined(_OPENMP). However, there are likely not too many lines and someone may decide about it...

alazzaro commented 5 years ago

No really, we use !$ everywhere in the code! Is this with a recent GCC compiler?

hfp commented 5 years ago

I used GNU Fortran 8.3 i.e., the latest valid version for me (side note: 9.1 fails tons of CP2K's regtests).

alazzaro commented 5 years ago

The man page says:

-fopenmp-simd
           Enable handling of OpenMP's SIMD directives with "#pragma omp" in C/C++ and "!$omp" in Fortran. Other OpenMP directives are ignored.

so it is clearly a bug in GCC during the preprocessing... Anyhow, I think the problem is understood, and it is no related to DBCSR (it is CP2K), but let's leave the issue open for future threading activity in DBCSR...

hfp commented 5 years ago

Thanks! The fix for me is also easy, I will just pull-off the -fopenmp-simd in case of POPT (or similar). This way, I can proceed building POPT with some libraries being MT. In fact, I am planning to enable a race-free (synchronized) batch-processing of SMMs using libxsmmext in particular for POPT.