DrTimothyAldenDavis / SuiteSparse

The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.
https://people.engr.tamu.edu/davis/suitesparse.html
Other
1.19k stars 265 forks source link

Cross compilation broke in 7.2 #428

Closed ViralBShah closed 1 year ago

ViralBShah commented 1 year ago

With the SuiteSparse 7.2 release, it seems that some compile time tests are being carried out, which were not an issue earlier. If possible, it would be great to avoid try_run() in Cmake so that SuiteSparse can be cross-compiled easily. In Julia's BinaryBuilder, we cross compile binaries for a large number of architectures on linux.

https://github.com/JuliaPackaging/Yggdrasil/pull/7479

2023-10-03 21:06:13 UTC | [21:06:14] CMake Error: try_run() invoked in cross-compiling mode, please set the following cache variables appropriately:
-- | --
  | 2023-10-03 21:06:14 UTC | [21:06:14]    HAVE_KEYWORD__THREAD_EXITCODE (advanced)
  | 2023-10-03 21:06:14 UTC | [21:06:14]    HAVE_KEYWORD__THREAD_EXITCODE__TRYRUN_OUTPUT (advanced)
  | 2023-10-03 21:06:14 UTC | [21:06:14] For details see /workspace/srcdir/SuiteSparse/CHOLMOD/build/TryRunResults.cmake
imciner2 commented 1 year ago

Caused by the tests for the thread attributes here: https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/75cb8b4be44227880388454dba022bc7442de61e/SuiteSparse_config/cmake_modules/SuiteSparse__thread.cmake#L46 The CMake command check_c_source_runs will both compile and run the code, so that is what is failing here.

There is an alternate check_c_source_compiles() (https://cmake.org/cmake/help/latest/module/CheckCSourceCompiles.html) CMake macro that will just check to see if the code can compile, and not run it (so that only tests the compiler), and based on the short snippets being used in the test, I think it would be safe to do the compile-only test instead of the full run test.

DrTimothyAldenDavis commented 1 year ago

I have several of the check_c_source_runs: three in the SuiteSparse__thread.cmake file, and another here:

https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/75cb8b4be44227880388454dba022bc7442de61e/GraphBLAS/cmake_modules/SuiteSparse_getenv.cmake#L26

That one for getenv does require "runs" not "compiles" since I must test not only the existence of getenv, but that it also recognizes the "HOME" string. I cannot just test that the code compiles. This may be a problem for cross-compilation, however.

I'd like to keep the check_c_source_runs in both files. Is there a way in cmake to determine if cross-compilation is in use?

Is there a way to check if getenv("HOME") will both compile and run on a cross-compilation scenario? Perhaps I could add a cmake option that skips these tests entirely and just asserts their results. Then you could use that to avoid any check_c_source_* entirely.

ViralBShah commented 1 year ago

Pretty sure that nothing can run in a cross compilation scenario, but I am quite sure that we have getenv on all our platforms (including windows).

If you can set up avoiding these checks on a branch of 7.2, we can test that particular commit and report back.

DrTimothyAldenDavis commented 1 year ago

Getenv works on windows but I think "HOME" doesn't work. I use "LOCALAPPDATA" (or something like that) on windows.

imciner2 commented 1 year ago

I'd like to keep the check_c_source_runs in both files. Is there a way in cmake to determine if cross-compilation is in use?

CMake has the variable CMAKE_CROSSCOMPILING that will be set when CMake detects it is cross compiling the code, so that can be used to detect if the executables can be run. Right now, I think only the thread checks are blocking stuff in Yggdrasil for Julia version updates. Is the getenv check only used in the GraphBlas source right now?

I ran some experiments on our compilation system, and the compilation passes on our platform by replacing the check_c_source_runs with check_c_source_compiles in the thread checks (with one other minor patch I'll make a PR for), although we apparently do see the bug described in https://sourceforge.net/p/mingw-w64/bugs/327/, where the mingw build for Windows actually reports that both the __thread and __declspec(thread) syntax are supported in the configure check when it actually doesn't support the declspec version because mingw doesn't error on the latter and only gives a warning it isn't supported. But luckily, the __thread version is a higher priority in the code, so that doesn't actually break anything.

If you don't want to change to compile-only checks, another option that might work for us is a flag to force the use of the C11 standard and not do any checks. This would guarantee the compiler would have the _Thread_local attribute and remove the need for the configure-time check for us. GCC 4.9+ and Clang 3.3+ have _Thread_local, so we can use versions after those on all our platforms to get the C11 support.

The flag to force C11 would need to do the following:

set( CMAKE_C_STANDARD 11 )
set( CMAKE_C_STANDARD_REQUIRED ON )

to force C11 mode on all targets that included the threads module. If this method is more palatable, I can put together a patch for it and do a test in Yggdrasil before making a PR for it.

imciner2 commented 1 year ago

GCC 4.9+ and Clang 3.3+ have _Thread_local

Except apparently Apple Clang doesn't have _Thread_local... so that won't work for us actually. it does have __thread, so that's at least something. I'm running a test to see if we could just force __thread for all our platforms instead.

imciner2 commented 1 year ago

Ok, my compilation tests completed, and all platforms compiled with no errors when I force HAVE_KEYWORD__THREAD to be on in the CMake config. So from the Julia/Yggdrasil perspective, the ability to force the use of the __thread keyword will allow us to cross compile.

mmuetzel commented 1 year ago

See, e.g., here for how that is handled for cross-building SuiteSparse for Octave for Windows: https://hg.octave.org/mxe-octave/file/tip/src/suitesparse.mk#l36

Essentially, add -DHAVE_GETENV_HOME=ON to the configuration flags.

mmuetzel commented 1 year ago

Regarding the __thread test: IIUC, that is only run if OpenMP wasn't found or was deactivated. Afaict, it is highly recommended to build GraphBLAS with OpenMP.

DrTimothyAldenDavis commented 1 year ago

I can certainly give a try for the threads issue in cholmod, and replace try-run with try-compile. I put that in myself, not because of the changes in the cmake build system.

Still unsure about try-run vs try-compile for the getenv issue in GraphBLAS, because it is runtime behavior that I need to test. An option to set it on/off/auto (where auto means try run) would seem to be the best solution.

I'm still mostly away from my keyboard though, recovering ok just slowly. As a result I haven't even been able to review PRs yet.

mmuetzel commented 1 year ago

Still unsure about try-run vs try-compile for the getenv issue in GraphBLAS, because it is runtime behavior that I need to test. An option to set it on/off/auto (where auto means try run) would seem to be the best solution.

If this depends so much on the system that GraphBLAS runs on, shouldn't that be a runtime check instead of a build time check?

mmuetzel commented 1 year ago

It looks like the only consequential thing that test does is adding the preprocessor definition NGETENV_HOME if that test didn't execute correctly. But I can find nowhere where that definition would be used. Can that test just be removed?

Edit: There is a runtime check with fallback here: https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/75cb8b4be44227880388454dba022bc7442de61e/GraphBLAS/Source/GB_jitifyer.c#L257-L264 But not here: https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/75cb8b4be44227880388454dba022bc7442de61e/GraphBLAS/CUDA/GB_jit_cache.cu#L84

Maybe the latter should also use the fallback (for Windows)?

mmuetzel commented 1 year ago

See #430 which should avoid the issue with HAVE_GETENV_HOME when cross-compiling (by removing that test).

DrTimothyAldenDavis commented 1 year ago

Thanks for all the feedback, everyone. Slowly getting up to speed here ...

Regarding GraphBLAS:

The GraphBLAS/CUDA code is not yet in production, and it will need to use the same JIT cache as the latest CPU JIT. The CUDA kernels will not work on Windows anyway (ever) because I need to use the Rapids Memory Manager.

GraphBLAS assumes ANSI C11, and even requires it. getenv is in ANSI C11 but the input strings are implementation defined. So I must use a run time test to see if getenv("HOME") can be used. However, it does look like the cmake test for getenv("HOME") is not actually required since I also do the run-time test in GraphBLAS/GB_jitifyer.c. I guess I don't need the cmake test at compile time after all. I think I added the test in cmake, then after getting the JIT to work without the cmake test, I never went back to delete the cmake test. I think for a while I thought I needed to know the result from getenv("HOME") during compilation but it turns out that I don't. Sorry for the confusion. This was just code draft/rewrite/get-it-to-work/then-oops-I-had-a-leftover-bit-I-didn't-need. I guess

So it looks like I can entirely delete the GraphBLAS/cmake_modules/SuiteSparse_getenv.cmake file (and its use) entirely.

That's what #430 does, from @mmuetzel , except the CUDA/GB_jit_cache.cu file does not need to be fixed right now. The correct fix will be to merge the latest CPU JIT with the older CUDA JIT, so the CUDA JIT gets its cache folder from the GraphBLAS/GB_jitifyer.c functions, rather then getting the cache folder on its own.

Regarding CHOLMOD:

I can probably safely go with a compile time check. I added the check for __threads to fix a METIS bug, where parallel calls to CHOLMOD (then calling METIS) was hammering the system rand() function and getting horrible performance. There's a closed issue in SuiteSparse about that. See https://github.com/DrTimothyAldenDavis/SuiteSparse/issues/375 and https://github.com/DrTimothyAldenDavis/SuiteSparse/commit/f3ac3eafa6531569f03768af7def36d487c494db . That fix was done on Sept 4, just recently, so @Wimmerer wouldn't have caught it.

... as you can tell I back at the keyboard, but laying flat so my code updates / reviews will be slow.

DrTimothyAldenDavis commented 1 year ago

I will try switching the check for __thread from "try run" to "try compile". However, I saw this document that @imciner2 referred to when fixing the ordering of the keywords: https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

6.64 Thread-Local Storage Thread-local storage (TLS) is a mechanism by which variables are allocated such that there is one instance of the variable per extant thread. The runtime model GCC uses to implement this originates in the IA-64 processor-specific ABI, but has since been migrated to other processors as well. It requires significant support from the linker (ld), dynamic linker (ld.so), and system libraries (libc.so and libpthread.so), so it is not available everywhere.

The last sentence has me worried just a bit. Could a "try compile" for __thread succeed, but then fail at run time because the libc.so or libpthread.so doesn't support it? If so, then "try run" would catch that and thus not use __thread, but "try compile" would not work.

The safest thing, I think, would be to use "try compile" only when cross compiling, and stick with "try run" otherwise.

Regarding the use of the ANSI C11 _Thread_local: this is not a good option to place before __thread, unfortunately. I've never gotten _Thread_local to actually work, even when forcing ANSI C11 compliance. This is why I place it last in the list of methods for creating thread local storage. It's safe there because the "try run" in the SuiteSparse__thread.cmake script will safely detect if it's available or not. It just never gets used. From what I can tell, no compiler supports it.

mmuetzel commented 1 year ago

try_compile in CMake is a bit of a misnomer because it not only compiles the source. But it also tries to link an executable (or a static library). The linker should be failing if there are undefined symbols when trying to link an executable. So, try_compile should be fine for the purpose here.

Fwiw, this is also true for check_c_source_compiles: https://cmake.org/cmake/help/latest/module/CheckCSourceCompiles.html

DrTimothyAldenDavis commented 1 year ago

OK. I'll go with check_c_source_compiles throughout. I just pushed a change to the dev2 branch, with this change.

Can you try the dev2 branch to see if this fixes the cross compile issue?

ViralBShah commented 1 year ago

Yes, the dev2 branch fixes the cross compile issue. Thank you for the quick fix. If it is possible to have a new release with these updates in the near future, it will be nice to ship Julia with an official released version of SuiteSparse.

DrTimothyAldenDavis commented 1 year ago

I'm in the process of posting a v7.2.1.beta2 (see https://github.com/DrTimothyAldenDavis/SuiteSparse/pull/433).

DrTimothyAldenDavis commented 1 year ago

v7.2.1.beta2 is now posted: https://github.com/DrTimothyAldenDavis/SuiteSparse/releases/tag/v7.2.1.beta2 .

Once I get some feedback from it from distro maintainers, I'll post it as the stable v7.2.1 release.

DrTimothyAldenDavis commented 1 year ago

On 2nd thought, I should leave this issue open until v7.2.1 is posted as a stable release, in case I've missed something.

DrTimothyAldenDavis commented 1 year ago

https://github.com/DrTimothyAldenDavis/SuiteSparse/releases/tag/v7.2.1 now posted as a stable release.