Reference-LAPACK / lapack

LAPACK development repository
Other
1.49k stars 434 forks source link

Test program failures in OpenMP builds of LAPACK #148

Closed mickpont closed 1 year ago

mickpont commented 7 years ago

The subroutine iparam2stage (file SRC/iparam2stage.F) introduced at LAPACK 3.7.0, if compiled with OpenMP enabled, uses the value returned by OMP_GET_NUM_THREADS() to determine algorithmic parameters and the amount of workspace required by various routines.

A problem is that if nthreads is greater than 1, then the test programs for the various routines fail (because the test programs assume nthreads=1 and so have the wrong expectations for what workspace sizes will be needed).

It's not actually necessary to compile LAPACK with OpenMP enabled to show this problem; just set nthreads to something bigger than 1 in iparam2stage.F (I arbitrarily used nthreads=77). The test programs then reveal:

csb.out: XERBLA was called from CHBEVD_2STAGE with INFO = 11 instead of 13 csb.out: XERBLA was called from CHBEVD_2STAGE with INFO = 11 instead of 15 cse2.out: XERBLA was called from CHEEVD_2STAGE with INFO = 8 instead of 10 cse2.out: XERBLA was called from CHEEVR_2STAGE with INFO = 18 instead of 20 cse2.out: XERBLA was called from CHEEVR_2STAGE with INFO = 18 instead of 22 csep.out: XERBLA was called from CHEEVD_2STAGE with INFO = 8 instead of 10 csep.out: XERBLA was called from CHEEVR_2STAGE with INFO = 18 instead of 20 csep.out: XERBLA was called from CHEEVR_2STAGE with INFO = 18 instead of 22

and similar things for other precisions.

Obviously this could be fixed by making the test programs check how many threads are being used too - but that might be a bit painful - and I think separate calls of OMP_GET_NUM_THREADS() aren't necessarily going to return the same value. (I suppose the test programs could call OMP_SET_NUM_THREADS() to force nthreads to be 1, but then they mightn't be testing all parts of the algorithms)

Incidentally, is it really necessary to use preprocessed .F files to enable OpenMP? For example why can't these lines

if defined(_OPENMP)

!$OMP PARALLEL NTHREADS = OMP_GET_NUM_THREADS() !$OMP END PARALLEL

endif

just be replaced by

!$OMP PARALLEL !$OMP NTHREADS = OMP_GET_NUM_THREADS() !$OMP END PARALLEL

so that there is no need for preprocessing, and no need for special filenames?

Mick Pont

jeffhammond commented 7 years ago

The following code does not make sense.

!$OMP NTHREADS = OMP_GET_NUM_THREADS()

You cannot invoke a function and assign to a program variable in a directive.

However, you implicitly raise the valid point that preprocessing is not standard Fortran.

weslleyspereira commented 3 years ago

In the current master branch (1827da0da50dd1ab9aa38f5adb1c138b74332362), I could reproduce similar failures:

$ cat TESTING/testing_results.txt | grep -E "(XERBLA|failed)"
 *** XERBLA was called from SSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** SST routines failed the tests of the error exits ***
 *** XERBLA was called from SSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** SST routines failed the tests of the error exits ***
 *** XERBLA was called from DSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** DST routines failed the tests of the error exits ***
 *** XERBLA was called from DSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** DST routines failed the tests of the error exits ***
 *** XERBLA was called from CHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** CST routines failed the tests of the error exits ***
 *** XERBLA was called from CHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** CST routines failed the tests of the error exits ***
 *** XERBLA was called from CHBEVD_2STAGE with INFO =     11 instead of 13 ***
 *** XERBLA was called from CHBEVD_2STAGE with INFO =     11 instead of 15 ***
 *** CHB routines failed the tests of the error exits ***
 *** XERBLA was called from ZHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** ZST routines failed the tests of the error exits ***
 *** XERBLA was called from ZHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** ZST routines failed the tests of the error exits ***
 *** XERBLA was called from ZHBEVD_2STAGE with INFO =     11 instead of 13 ***
 *** XERBLA was called from ZHBEVD_2STAGE with INFO =     11 instead of 15 ***
 *** ZHB routines failed the tests of the error exits ***

I am using:

cmake -GNinja -DBUILD_TESTING=ON -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_Fortran_FLAGS="-frecursive -fimplicit-none -fopenmp" \
-DCBLAS=ON -DLAPACKE=ON -DLAPACKE_WITH_TMG=ON ..
weslleyspereira commented 1 year ago

I believe this issue was solved at some point.

  1. I cannot reproduce the issue anymore.
  2. Moreover, we have added a script that builds and tests LAPACK with OpenMP. See:

    https://github.com/Reference-LAPACK/lapack/blob/dfad0d5639d669736afc71d57e24d95001279577/.github/workflows/cmake.yml#L49-L127

I will close this issue. Please let me know if we should reopen it.