eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
61 stars 13 forks source link

Remove most special handling of MKL in CMake configuration #1149

Closed msimberg closed 3 months ago

msimberg commented 3 months ago

This is currently untested, but early feedback more than welcome and the general structure is in place!

The first version of the MKL auto-detection simply uses try_compile with LAPACK_LIBRARY and a call to mkl_set_num_threads_local. It uses the result of this try_compile as the default value of DLAF_WITH_MKL. It's important to note that currently this will only be tested on the first configuration, so if the first configuration is wrong, and one then changes LAPACK_LIBRARY to have the correct paths, DLAF_WITH_MKL will still be off.

On the other hand, the above method means that one can simply set DLAF_WITH_MKL manually to on if one expects MKL to be there, and compilation will then fail if it's not there (instead of quietly disabling it).

The logic could be changed/improved to have different behaviour in the above case.

An alternative method is to simply use __has_include in single_threaded_blas.cpp. This doesn't allow overriding and is silent in either case (MKL found or not found).

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

I've probably messed something up in the last pushes, but in theory this is close to how I imagine the final state to be. I've made separable compilation with CUDA conditional on the umpire version, with a comment: https://github.com/eth-cscs/DLA-Future/blob/460d24f4e41a20bb34481dae8a45664fdd755a3d/src/CMakeLists.txt#L61-L69.

Known open issues are:

The rest is obviously open to comments as well, but I've attempted to clean it up as far as I can.

msimberg commented 3 months ago

I've changed back the CUDA CI to use -Werror=all-warnings since we now don't get linker warnings with the newest version of Umpire. We will still get the warnings on builds with older versions of Umpire, but outside of CI those warnings should not be fatal. I opened https://github.com/eth-cscs/DLA-Future/issues/1154 as a reminder to potentially look at the other warnings.

albestro commented 3 months ago

if and how to cache the try_compile result for detecting MKL

Not sure... maybe @albestro has an idea?

I would say that it might be nicer to not cache, so that any change to LAPACK_LIBRARY/LAPACK_INCLUDE_DIR will be reflected also in the MKL_TRY_COMPILE result.

Otherwise, let's suppose we do this sequence

with the cached result for MKL we might end up getting BLAS/LAPACK error, because of the check_function_exists in the FindLAPACK that explicitly clear the cache, but at the same time we get that MKL is found correctly (because it is checked just at the first time).

So, I would suggest to do the same we do for LAPACK in FindLAPACK, i.e. unset(DLAF_WITH_MKL_TRY_COMPILE CACHE). And, trivially, in the future start using NO_CACHE where possible (apparently up to 3.29 it is possible for try_compile but not for check_function_exists).

msimberg commented 3 months ago

if and how to cache the try_compile result for detecting MKL

Not sure... maybe @albestro has an idea?

I would say that it might be nicer to not cache, so that any change to LAPACK_LIBRARY/LAPACK_INCLUDE_DIR will be reflected also in the MKL_TRY_COMPILE result.

Otherwise, let's suppose we do this sequence

* `cmake -DLAPACK_LIBRARY=<working-config> ...`

* `cmake -DLAPACK_LIBRARY=<non-working-config> ...`

with the cached result for MKL we might end up getting BLAS/LAPACK error, because of the check_function_exists in the FindLAPACK that explicitly clear the cache, but at the same time we get that MKL is found correctly (because it is checked just at the first time).

So, I would suggest to do the same we do for LAPACK in FindLAPACK, i.e. unset(DLAF_WITH_MKL_TRY_COMPILE CACHE). And, trivially, in the future start using NO_CACHE where possible (apparently up to 3.29 it is possible for try_compile but not for check_function_exists).

Currently on this branch, the try_compile result is used as the default value for DLAF_WITH_MKL. This is done with the idea that DLAF_WITH_MKL can be set explicitly by a user to signal that they expect MKL to be used or not used (leading to compilation failures if MKL isn't actually set up correctly). In this case it doesn't make much difference if DLAF_WITH_MKL_TRY_COMPILE is a cache variable or not, and if it's reset or not because it'll only be used once to set the default.

I think if we reset and redetect MKL every time then DLAF_WITH_MKL_TRY_COMPILE should just be DLAF_WITH_MKL. I haven't tried it yet, but in this case I think we can simply let try_compile create the variable. If the user has already set it the try_compile should be skipped (but as I said, not verified). Then a user can -UDLAF_WITH_MKL to redetect it if they don't want to set it explicitly.

As a last option, we try_compile into DLAF_WITH_MKL and don't cache the variable. This, however, means that the user can't override the option (as far as I can think of...).

rasolca commented 3 months ago

Not sure what to say... just realized that auto-detection might have false positive if the mkl include directory is present (e.g. if using mkl fftw). So the possibility for the user should be available. I think it should work like this:

DLAF_WITH_MKL: is a cached value set by user

DLAF_WITH_MKL try_compile -DDLAF_WITH_MKL
ON not run added
OFF not run skipped
Not defined OK added
Not defined FAIL skipped

If a variable is also needed in the rest of the cmake scripts I would use a different name.

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

I think it should work like this:

DLAF_WITH_MKL: is a cached value set by user DLAF_WITH_MKL try_compile -DDLAF_WITH_MKL ON not run added OFF not run skipped Not defined OK added Not defined FAIL skipped

If a variable is also needed in the rest of the cmake scripts I would use a different name.

This sounds pretty reasonable. I made an attempt at implementing this logic in https://github.com/eth-cscs/DLA-Future/pull/1149/commits/d5a7ae4624baf4fc66ae63b02fcc461d38ea3acf.

jst realized that auto-detection might have false positive if the mkl include directory is present (e.g. if using mkl fftw).

Yeah, I was worried about something like this as well. Without intending to I think this now covers that quite reasonably: https://github.com/eth-cscs/DLA-Future/blob/d5a7ae4624baf4fc66ae63b02fcc461d38ea3acf/CMakeLists.txt#L81-L84. When DLAF_WITH_MKL is explicitly enabled we can safely disable OpenMP. If DLAF_WITH_MKL is undefined that first branch is always false so we'll keep OpenMP enabled (if it wasn't changed by the user). In the worst case we end up calling both omp_set_num_threads and mkl_set_num_threads, but we won't end up in the situation where we think we're using MKL, but we're not setting omp_set_num_threads.

In the spack package we set DLAF_WITH_MKL explicitly so there we only end up with one or the other, never both, enabled.

At least this is how I think it's working now and I think it seems reasonable, but not 100% sure.

I think warning/status messages can still be tweaked, but I'll leave that up to you to comment about.

msimberg commented 3 months ago

cscs-ci run

msimberg commented 3 months ago

cscs-ci run