Closed Jellby closed 4 months ago
@Jellby Could you please post the full CMake configure line you are using ? Also when you build/use an existing Reference LAPACK library, do you know if it was built with the ilp64 option -DBUILD_INDEX64=ON
? Could you also confirm if the libs libblas64.a
and liblapack64.a
are available in the existing Ref. BLAS/LAPACK installation.
It's a bit of an unorthodox build, I'm afraid... There's a file blas.F
containing just lines like:
#include "caxpy.f"
#include "cgemm.f"
#include "dasum.f"
#include "daxpy.f"
#include "dcabs1.f"
...
which is compiled with:
$ gfortran -fdefault-integer-8 -I/path/to/lapack/BLAS/SRC -c blas.F
$ ar cr libblas.a blas.o
Then, for GA:
$ cmake -D ENABLE_FORTRAN=ON -D ENABLE_CXX=OFF -D ENABLE_TESTS=OFF -D ENABLE_I8=ON -D LINALG_REQUIRED_COMPONENTS=ilp64 -D ENABLE_BLAS=ON -D LINALG_PREFIX="/" -D BLAS_LIBRARIES=/path/to/libblas.a /path/to/ga
...
-- void size: 8, USE_I8: ON, ENABLE_I8: ON
-- Performing Test BLAS_LOWER_UNDERSCORE
-- Performing Test BLAS_LOWER_UNDERSCORE -- found
-- Could NOT find BLAS (missing: ilp64)
CMake Error at cmake/ga-linalg.cmake:142 (message):
ENABLE_BLAS=ON, but a LAPACK library was not found
Call Stack (most recent call first):
CMakeLists.txt:154 (include)
-- Configuring incomplete, errors occurred!
And manually compiling the diagnostic file shows this:
$ gfortran -DDGEMM_NAME=dgemm_ /path/to/ga/cmake/util/ilp64_checker.c libblas.a
$ ./a.out ; echo $?
** On entry to DGEMM parameter number 8 had an illegal value
0
while compiling blas.F
with fno-default-integer-8
gives just 0
, without the ** On entry
line (so a true lp64 dgemm is working correctly).
We use the CMake LinAlg modules provided by @wavefunction91 and the modules only support the recommended Ref. BLAS/LAPACK/ScaLAPACK
builds. In this case, the solution might just be to cp libblas.a libblas64.a
and try again. Also, ENABLE_I8
does not have to be specified since it's set automatically.
$ gfortran -DDGEMMNAME=dgemm /path/to/ga/cmake/util/ilp64_checker.c libblas.a $ ./a.out ; echo $? ** On entry to DGEMM parameter number 8 had an illegal value 0
I think the way it's setup is if this test fails, the CMake BLAS discovery module determines it to be ILP64, otherwise LP64. See https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/BLASUtilities.cmake#L93-L108 https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L25-L28
If you want to use ILP64 reference ScaLAPACK, you'll have to specify it manually (e.g. specify ScaLAPACK_LIBRARIES=...
, ScaLAPACK_IS_LP64=FALSE
), we assume NETLIB ScaLAPACK is LP64, and would need to run an MPI program to validate this at configure (not generally possible on e.g. SLURM systems)
I think the way it's setup is if this test fails, the CMake BLAS discovery module determines it to be ILP64, otherwise LP64.
The point is the test does not "fail", but it should. The dgemm
call aborts, but the program still returns 0
, which makes it look successful. Perhaps the test should check that the return code is 0
and that stderr/stdout is empty?
@Jellby Does this happen even after renaming libblas.a -> libblas64.a
? I think that is because ReferenceBLAS_ilp64_FOUND is set to FALSE because it could not find libblas64.a
with -DLINALG_REQUIRED_COMPONENTS=ilp64
@wavefunction91 Is that right?
@ajaypanyala That shouldn't matter, he's specifying it manually so it should just go through the validation.
@Jellby Checking output would not be robust in general, for example, it's possible to trigger call-tree print in MKL with an environment variable while still giving the correct result.
This is strange, if your BLAS library is ILP64, I can't see how it would give the correct results here https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L37
@wavefunction91 It's not giving the correct result, the dgemm
call (https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L29-L32) aborts and the program returns with 0
, the check after is never executed.
@Jellby Program aborts signal 0? That's not canonical at all, SIGABRT
should be a return of 6, no?
EDIT: Nevermind, guess I should have read the entire issue - the issue is in XERBLA, OK. That's tedious, I'll have to give it some thought as to how to best handle that robustly. I guess we could change the exe return to be the checksum, that way 0 is a failure - 8 is a success?
Yeah, it's a Fortran STOP
, I don't know if the standard says anything about the return code of that, but apparently it's a "normal termination" and a return code of 0
is recommended (https://stackoverflow.com/a/49545586/). Maybe, instead of the return code, something more specific (e.g. "dgemm call completed sucessfully") could be written to stdout and that could be tested.
That's a good idea, I'll add that today and @ajaypanyala can update it here when its checked in
@Jellby Could you please try issue206
branch and see if that fixes this issue. This branch has @wavefunction91's latest fixes.
It seems to work, but I'll have to change the infrastructure somewhat to be able to do a real test.
@Jellby I have merged the latest fixes from @wavefunction91 to develop
.
Somehow related to this (tell me if I should open a new issue): Now detection of LAPACK fails, apparently because:
a) The order of libraries matters. This for a case where I pass BLAS_LIBRARIES="foo.a;bar.a"
, and one has BLAS and the other LAPACK, and I don't necessarily know which is which. Could the finder figure out whether LAPACK needs the (already found) BLAS libraries and add them to the test (as it does with gfortran, etc.)?
b) libm
is needed, but not identified as a requirement. Apparently this is because it just searches for logf
in the error messages, but in my case (probably because I have only a subset of LAPACK), the missing symbol is cabs
.
About b), actually the error I get is something like https://stackoverflow.com/questions/30013845, where a single function is mentioned, and that might be arbitrary, but at least there's an explicit libm.so
in the error message. Perhaps https://github.com/GlobalArrays/ga/blob/da7a53ffca297b6856962aace33e3b44c978b0a7/cmake/linalg-modules/util/CommonFunctions.cmake#L180 could say something like this instead:
if( ${__compile_output} MATCHES "logf" OR ${__compile_output} MATCHES "/libm.so" )
And another suggestion. LAPACK is identified by searching for dpstrf
, but this function doesn't seem to be used anywhere in GA. Wouldn't it make more sense to search for something that's actually used (like dsyev
, which is already used for identifying the interface)?
Closing due to inactivity. Please feel free to re-open if needed.
I would like to use a compiled Reference BLAS (and LAPACK), as part of another package. The libraries should be ilp64 and I add it as a required linalg component. However, I get, with CMake:
It seems https://github.com/GlobalArrays/ga/blob/develop/cmake/linalg-modules/util/ilp64_checker.c returns with:
but with return code of
0
, and the result checksum is never done.In https://github.com/Reference-LAPACK/lapack/blob/master/BLAS/SRC/xerbla.f, changing the
STOP
toSTOP 1
, causes at least the return code to be1
and the BLAS library being identified as ilp64.