GraphBLAS / LAGraph

This is a library plus a test harness for collecting algorithms that use the GraphBLAS. For test coverage reports, see https://graphblas.org/LAGraph/ . Documentation: https://lagraph.readthedocs.org
Other
229 stars 61 forks source link

Build issues: problems finding the correct SuiteSparse library #140

Closed mcmillan03 closed 1 year ago

mcmillan03 commented 2 years ago

The first issue is a documentation issue:

The second issue is that when I follow the instructions in the top-level CMakeLists.txt I get the following error:

smcmillan@ubuntu ~/github/LAGraph/build (stable_markings) $ GRAPHBLAS_ROOT=/home/smcmillan/SuiteSparse/GraphBLAS cmake ..
-- The C compiler identification is GNU 11.1.0
-- The CXX compiler identification is GNU 11.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- GraphBLAS root:              
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find GraphBLAS: Found unsuitable version "6.2.0", but required is
  at least "7.0.1" (found
  /home/smcmillan/SuiteSparse/GraphBLAS-6.2.0/build/libgraphblas.so.6.2.0)
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:592 (_FPHSA_FAILURE_MESSAGE)
  cmake_modules/FindGraphBLAS.cmake:84 (find_package_handle_standard_args)
  CMakeLists.txt:120 (find_package)

When I export the location as an environment variable (again referencing the softlink) I get the same error.

When I use specifically the 7.2.0 version of the directory I still get the same error.

The only way that seems to work is to remove all other versions of Suitesparse from that directory (I put them in a subdirectory called ‘old’).

I point out a third minor issue where the GraphBLAS root (red message below) is never printed out during cmake even when it is set as an environment variable:

$ cmake ..
-- The C compiler identification is GNU 11.1.0
-- The CXX compiler identification is GNU 11.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- GraphBLAS root:              
-- Found GraphBLAS: /home/smcmillan/SuiteSparse/GraphBLAS-7.2.0/build/libgraphblas.so.7.2.0 (found suitable version "7.2.0", minimum required is "7.0.1") 
-- GraphBLAS include dir: /home/smcmillan/SuiteSparse/GraphBLAS-7.2.0/Include
-- GraphBLAS library:     /home/smcmillan/SuiteSparse/GraphBLAS-7.2.0/build/libgraphblas.so.7.2.0
-- GraphBLAS version:     7.2.0
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- CMAKE build type:          Release
-- CMAKE source directory:    /home/smcmillan/github/LAGraph
-- CMAKE build directory:     /home/smcmillan/github/LAGraph/build
-- CMAKE C Flags release:     -O3 -DNDEBUG
-- CMAKE compiler ID:         GNU
-- CMAKE thread library:      -lpthread
-- CMAKE have pthreads:       1
-- CMAKE have Win32 pthreads: 
-- CMAKE have OpenMP:         TRUE
-- GxB build: relying on SuiteSparse GxB extensions
-- CMAKE C flags:  -fopenmp  -DLGDIR=/home/smcmillan/github/LAGraph -std=c11 -lm -Wno-pragmas  -O3 -DNDEBUG
-- Installation in: /usr/local
-- Configuring done
-- Generating done
-- Build files have been written to: /home/smcmillan/github/LAGraph/build
DrTimothyAldenDavis commented 2 years ago

I've seen another issue where the "FindGraphBLAS" properly locates the right libgraphblas.so and the right GraphBLAS.h include file, but when I looked at the built demo (tc_gpu_demo in this case, in the cudanew branch), the program had been linked against a different copy of libgraphblas.so (one that was actually on my LD_LIBRARY PATH, but it wasn't the one I wanted). The FindGraphBLAS.cmake differs slightly on this branch but perhaps it's a symptom of the same problem. When I deleted the other libgraphblas.so, the correct libgraphblas.so was found automatically.

mcmillan03 commented 1 year ago

When I define GRAPHBLAS_ROOT it does not work. But when I define GraphBLAS_ROOT it does. I think we should be using all caps but I am not sure why the latter works.

When I build with the latter and then define LD_LIBRARY_PATH to point to the wrong library, it does NOT cause the tests to fail.

Also, this aspect of the build configuration options should be in the README.md file

wohlbier commented 1 year ago

I modified the FindGraphBLAS.cmake script to add the following two hints before the other hints:

HINTS ${GRAPHBLAS_ROOT}
HINTS $ENV{GRAPHBLAS_ROOT}

to both find_path and find_library calls and it worked for me. The former handles this invocation

cmake -DGRAPHBLAS_ROOT=/path/to/gb ../

and the latter handles this invocation

GRAPHBLAS_ROOT=/path/to/gb cmake ../
DrTimothyAldenDavis commented 1 year ago

This should now be fixed on the fix_findGraphBLAS branch, which I just pushed to the dev branch.

DrTimothyAldenDavis commented 1 year ago

Caveat: the only thing that doesn't seem to work is the LD_LIBRARY_PATH override. I have a personal folder, ~/bin, that I add to my LD_LIBRARY_PATH. If libgraphblas.so appears there, it gets linked in instead of the one listed in the GRAPHBLAS_ROOT. This happens even though cmake reports that the GraphBLAS library specified in GRAPHBLAS_ROOT is found. The same thing happens with both GraphBLAS_ROOT and GRAPHBLAS_ROOT.

Otherwise, the GRAPHBLAS_ROOT setting now takes precedence over any other library, say in LAGraph/../GraphBLAS/build for example.

For this example, both GraphBLAS libraries are v7.3.3, but the dates differ. I have a GxB method to query the date, and this tells me which library that gets linked in. The LAGraph test ./build/src/test_Init thus fails. I can also see the wrong library (from ~/bin) when I do ldd ./build/liblagraph.so.

This is on a Linux machine, with Ubuntu 20.04, gcc 9.4.0, and cmake 3.24.2.

DrTimothyAldenDavis commented 1 year ago

output of cmake (now printing both GraphBLAS_ROOT and GRAPHBLAS_ROOT, on the fix_GraphBLAS branch). So far so good:

...
-- GraphBLAS_ROOT:  /home/faculty/d/davis/dev2/SuiteSparse/GraphBLAS/
-- GRAPHBLAS_ROOT:  
-- Found GraphBLAS: /home/faculty/d/davis/dev2/SuiteSparse/GraphBLAS/build/libgraphblas.so.7.3.3 (found suitable version "7.3.3", minimum required is "7.0.1") 
-- GraphBLAS version: 7.3.3
-- GraphBLAS include: /home/faculty/d/davis/dev2/SuiteSparse/GraphBLAS/Include
-- GraphBLAS library: /home/faculty/d/davis/dev2/SuiteSparse/GraphBLAS/build/libgraphblas.so.7.3.3
-- GraphBLAS static:  /home/faculty/d/davis/dev2/SuiteSparse/GraphBLAS/build/libgraphblas.so
...

output of ./build/src/test_Init

Test Init...                                    
library: SuiteSparse:GraphBLAS 7.3.3 (Dec 1, 2022)
include: SuiteSparse:GraphBLAS 7.3.3 (Dec 9, 2022)
[ FAILED ]
  test_Init.c:51: Check strcmp (date, "Dec 9, 2022") == 0... failed
GraphBLAS compiled with: GNU gcc 9.4.0 v9.4.0
LAGraph version 1.0.1 (Oct 28, 2022) from LAGraph.h
LAGraph version 1.0.1 (Oct 28, 2022) from LAGraph_Version
FAILED: 1 of 1 unit tests has failed.

output of ldd build/liblagraph.so:

    linux-vdso.so.1 (0x00007ffd7a55d000)
    libgraphblas.so.7 => /home/faculty/d/davis/bin/libgraphblas.so.7 (0x00007fce0781c000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fce076af000)
    libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fce0766b000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fce07479000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fce07456000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fce11cc6000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fce07450000)

Note that my /home/faculty/d/davis/bin folder is at the end of my LD_LIBRARY_PATH, and this folder contains the Dec 1st version of GraphBLAS v7.3.3.

According to the cmake description of how the environment variables are used, the PackageName_ROOT variable should take precedence over everything, including LD_LIBRARY_PATH. So I'm puzzled.

DrTimothyAldenDavis commented 1 year ago

Regarding the case sensitive nature of the variable: the cmake documentation says that the order is:

(1) PackageName_ROOT (variable or env. variable). This would be GraphBLAS_ROOT. (2) ... (3) ... (4) the HINTS, with GRAPHBLAS_ROOT.

DrTimothyAldenDavis commented 1 year ago

Here's what readelf reports, which has the correct runpath. But if /home/faculty/d/davis/bin/libgraphblas.so exists, it gets linked in instead of the copy in the dev2/SuiteSparse/GraphBLAS/build folder (from GraphBLAS_ROOT).

hypersparse $ readelf -d build/liblagraph.so

Dynamic section at offset 0x376d0 contains 29 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libgraphblas.so.7]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgomp.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [liblagraph.so.1]
 0x000000000000001d (RUNPATH)            Library runpath: [/home/faculty/d/davis/v733/LAGraph/build:/home/faculty/d/davis/dev2/SuiteSparse/GraphBLAS/build:]
DrTimothyAldenDavis commented 1 year ago

From "man ld" it looks like LD_LIBRARY_PATH takes precedence over the RUNPATH in the *.so file, unless you are running in "secure mode". That mode may be set at the system level. So perhaps you're not seeing the effect of LD_LIBRARY_PATH because your system is running in secure mode. From the man page:

...
       o  Using  the  environment  variable  LD_LIBRARY_PATH,  unless the exe‐
          cutable is being run in secure-execution mode (see below), in  which
          case this variable is ignored.

       o  Using  the  directories  specified in the DT_RUNPATH dynamic section
          attribute of the binary if present. 
...

If I use LD_DEBUG=libs ; export LD_DEBUG and then run ./build/src/test_Init, I see the libgraphblas.so.7 being loaded from the path listed in the LD_LIBRARY_PATH. For other libraries, I see LD_LIBRARY_PATH being searched before the RUNPATH. The "secure mode" ignores the LD_LIBRARY_PATH entirely.

So it seems to me that LD_LIBRARY_PATH is supposed to override any RUNPATH in the liblagraph.so. You must not be seeing it because you're running in 'secure mode'.

I think this means the current FindGraphBLAS.cmake works fine. On my side, I use LD_LIBRARY_PATH to define libraries for MATLAB to find (I use this for libgraphblas_matlab.so). However, I can do that in a script that sets LD_LIBRARY_PATH just for MATLAB, so it doesn't interfere with the rest of my shell commands.

DrTimothyAldenDavis commented 1 year ago

Scott and I just figured out what's going on. The behavior of ld has changed. There are to kinds of paths in an ELF binary: the RPATH and the RUNPATH. See https://akkadia.org/drepper/dsohowto.pdf .

The search order is: (1) RPATH (in the binary), (2) LD_LIBRARY_PATH, (3) RUNPATH (in the binary), (among other things).

The new behavior of ld is to set the RUNPATH, not RPATH, in the binary. My ld is set to the new behavior, and so for me, my LD_LIBRARY_PATH overrides my RUNPATH. As a result, CMake says it finds a specific libgraphblas.so, but when a test is run, it links against the version pointed to by LD_LIBRARY_PATH, which is different.

Scott's ld is using the old behavor, so it sets RPATH in the ELF binary, not RUNPATH. As a result, his ld finds the version of libgraphblas.so found by CMake, and set in the RPATH. It ignores the LD_LIBRARY_PATH.

To force the old behavior (setting RPATH not RUNPATH), this can be added to the LAGraph CMakeLists.txt, but this is Linux specific:

SET(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags")

To force the new behavior of ld (setting RUNPATH not RPATH), which is also Linux-specific:

SET(CMAKE_EXE_LINKER_FLAGS "-Wl,--enable-new-dtags")

I have no idea how the Mac handles this.

The problem with this is that it's possible to link against, say, GraphBLAS v7.3.2 in LAGraph, but the #include file found was v7.3.3 (say), as found by FindGraphBLAS.cmake. Normally that's not a problem, since any v7.x should be compatible. However, as a test to ensure the right GraphBLAS library is found, the LAGraph/src/test/test_Init asserts that the GraphBLAS version found in the #include "GraphBLAS.h" is identical to the version found in the library itself, at run time (via GxB_Global_Option_get (GxB_LIBRARY_VERSION, ver)).

With the new behavior (using RUNPATH, where LD_LIBRARY_PATH takes precedence), these versions don't match, so the test fails.

DrTimothyAldenDavis commented 1 year ago

See also https://stackoverflow.com/questions/52018092/how-to-set-rpath-and-runpath-with-gcc-ld

DrTimothyAldenDavis commented 1 year ago

We won't try to fix this: the ld behavior will act as 'new' or 'old' depending on the user's system.