GlobalArrays / ga

Partitioned Global Address Space (PGAS) library for distributed arrays
http://hpc.pnl.gov/globalarrays/
Other
100 stars 38 forks source link

BLAS misidentified as lp64 (it's ilp64) #206

Closed Jellby closed 4 months ago

Jellby commented 3 years ago

I would like to use a compiled Reference BLAS (and LAPACK), as part of another package. The libraries should be ilp64 and I add it as a required linalg component. However, I get, with CMake:

-- Could NOT find BLAS (missing: ilp64)

It seems https://github.com/GlobalArrays/ga/blob/develop/cmake/linalg-modules/util/ilp64_checker.c returns with:

 ** On entry to DGEMM parameter number  8 had an illegal value

but with return code of 0, and the result checksum is never done.

In https://github.com/Reference-LAPACK/lapack/blob/master/BLAS/SRC/xerbla.f, changing the STOP to STOP 1, causes at least the return code to be 1 and the BLAS library being identified as ilp64.

ajaypanyala commented 3 years ago

@Jellby Could you please post the full CMake configure line you are using ? Also when you build/use an existing Reference LAPACK library, do you know if it was built with the ilp64 option -DBUILD_INDEX64=ON ? Could you also confirm if the libs libblas64.a and liblapack64.a are available in the existing Ref. BLAS/LAPACK installation.

Jellby commented 3 years ago

It's a bit of an unorthodox build, I'm afraid... There's a file blas.F containing just lines like:

#include "caxpy.f"
#include "cgemm.f"
#include "dasum.f"
#include "daxpy.f"
#include "dcabs1.f"
...

which is compiled with:

$ gfortran -fdefault-integer-8 -I/path/to/lapack/BLAS/SRC -c blas.F
$ ar cr libblas.a blas.o

Then, for GA:

$ cmake -D ENABLE_FORTRAN=ON -D ENABLE_CXX=OFF -D ENABLE_TESTS=OFF -D ENABLE_I8=ON -D LINALG_REQUIRED_COMPONENTS=ilp64 -D ENABLE_BLAS=ON -D LINALG_PREFIX="/" -D BLAS_LIBRARIES=/path/to/libblas.a /path/to/ga
...
-- void size: 8, USE_I8: ON, ENABLE_I8: ON
-- Performing Test BLAS_LOWER_UNDERSCORE
-- Performing Test BLAS_LOWER_UNDERSCORE -- found
-- Could NOT find BLAS (missing: ilp64) 
CMake Error at cmake/ga-linalg.cmake:142 (message):
  ENABLE_BLAS=ON, but a LAPACK library was not found
Call Stack (most recent call first):
  CMakeLists.txt:154 (include)

-- Configuring incomplete, errors occurred!

And manually compiling the diagnostic file shows this:

$ gfortran -DDGEMM_NAME=dgemm_ /path/to/ga/cmake/util/ilp64_checker.c libblas.a
$ ./a.out ; echo $?
 ** On entry to DGEMM parameter number  8 had an illegal value
0

while compiling blas.F with fno-default-integer-8 gives just 0, without the ** On entry line (so a true lp64 dgemm is working correctly).

ajaypanyala commented 3 years ago

We use the CMake LinAlg modules provided by @wavefunction91 and the modules only support the recommended Ref. BLAS/LAPACK/ScaLAPACK builds. In this case, the solution might just be to cp libblas.a libblas64.a and try again. Also, ENABLE_I8 does not have to be specified since it's set automatically.

$ gfortran -DDGEMMNAME=dgemm /path/to/ga/cmake/util/ilp64_checker.c libblas.a $ ./a.out ; echo $? ** On entry to DGEMM parameter number 8 had an illegal value 0

I think the way it's setup is if this test fails, the CMake BLAS discovery module determines it to be ILP64, otherwise LP64. See https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/BLASUtilities.cmake#L93-L108 https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L25-L28

wavefunction91 commented 3 years ago

If you want to use ILP64 reference ScaLAPACK, you'll have to specify it manually (e.g. specify ScaLAPACK_LIBRARIES=..., ScaLAPACK_IS_LP64=FALSE), we assume NETLIB ScaLAPACK is LP64, and would need to run an MPI program to validate this at configure (not generally possible on e.g. SLURM systems)

Jellby commented 3 years ago

I think the way it's setup is if this test fails, the CMake BLAS discovery module determines it to be ILP64, otherwise LP64.

The point is the test does not "fail", but it should. The dgemm call aborts, but the program still returns 0, which makes it look successful. Perhaps the test should check that the return code is 0 and that stderr/stdout is empty?

ajaypanyala commented 3 years ago

@Jellby Does this happen even after renaming libblas.a -> libblas64.a ? I think that is because ReferenceBLAS_ilp64_FOUND is set to FALSE because it could not find libblas64.a with -DLINALG_REQUIRED_COMPONENTS=ilp64 @wavefunction91 Is that right?

wavefunction91 commented 3 years ago

@ajaypanyala That shouldn't matter, he's specifying it manually so it should just go through the validation.

@Jellby Checking output would not be robust in general, for example, it's possible to trigger call-tree print in MKL with an environment variable while still giving the correct result.

This is strange, if your BLAS library is ILP64, I can't see how it would give the correct results here https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L37

Jellby commented 3 years ago

@wavefunction91 It's not giving the correct result, the dgemm call (https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L29-L32) aborts and the program returns with 0, the check after is never executed.

wavefunction91 commented 3 years ago

@Jellby Program aborts signal 0? That's not canonical at all, SIGABRT should be a return of 6, no? EDIT: Nevermind, guess I should have read the entire issue - the issue is in XERBLA, OK. That's tedious, I'll have to give it some thought as to how to best handle that robustly. I guess we could change the exe return to be the checksum, that way 0 is a failure - 8 is a success?

Jellby commented 3 years ago

Yeah, it's a Fortran STOP, I don't know if the standard says anything about the return code of that, but apparently it's a "normal termination" and a return code of 0 is recommended (https://stackoverflow.com/a/49545586/). Maybe, instead of the return code, something more specific (e.g. "dgemm call completed sucessfully") could be written to stdout and that could be tested.

wavefunction91 commented 3 years ago

That's a good idea, I'll add that today and @ajaypanyala can update it here when its checked in

ajaypanyala commented 3 years ago

@Jellby Could you please try issue206 branch and see if that fixes this issue. This branch has @wavefunction91's latest fixes.

Jellby commented 3 years ago

It seems to work, but I'll have to change the infrastructure somewhat to be able to do a real test.

ajaypanyala commented 3 years ago

@Jellby I have merged the latest fixes from @wavefunction91 to develop.

Jellby commented 3 years ago

Somehow related to this (tell me if I should open a new issue): Now detection of LAPACK fails, apparently because:

a) The order of libraries matters. This for a case where I pass BLAS_LIBRARIES="foo.a;bar.a", and one has BLAS and the other LAPACK, and I don't necessarily know which is which. Could the finder figure out whether LAPACK needs the (already found) BLAS libraries and add them to the test (as it does with gfortran, etc.)?

b) libm is needed, but not identified as a requirement. Apparently this is because it just searches for logf in the error messages, but in my case (probably because I have only a subset of LAPACK), the missing symbol is cabs.

Jellby commented 3 years ago

About b), actually the error I get is something like https://stackoverflow.com/questions/30013845, where a single function is mentioned, and that might be arbitrary, but at least there's an explicit libm.so in the error message. Perhaps https://github.com/GlobalArrays/ga/blob/da7a53ffca297b6856962aace33e3b44c978b0a7/cmake/linalg-modules/util/CommonFunctions.cmake#L180 could say something like this instead:

if( ${__compile_output} MATCHES "logf" OR ${__compile_output} MATCHES "/libm.so" )

And another suggestion. LAPACK is identified by searching for dpstrf, but this function doesn't seem to be used anywhere in GA. Wouldn't it make more sense to search for something that's actually used (like dsyev, which is already used for identifying the interface)?

ajaypanyala commented 4 months ago

Closing due to inactivity. Please feel free to re-open if needed.