LLNL / sundials

Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
https://computing.llnl.gov/projects/sundials
BSD 3-Clause "New" or "Revised" License
523 stars 131 forks source link

Use `int64_t` for `KLU_INDEXTYPE` if `SUNDIALS_INT64_T` is true. #477

Closed mmuetzel closed 6 months ago

mmuetzel commented 6 months ago

long int is 32 bits wide on systems that use an LLP64 data model (e.g., Windows). Use a type for which the C standard guarantees that it is 64-bit instead (i.e., int64_t).

See the error in CI for PR #432.

drreynolds commented 6 months ago

This looks good to me, so I'm temporarily approving and initiating the CI runs. Before this is merged the fix should be mentioned in the CHANGELOG.md and doc/shared/RecentChanges.rst files.

mmuetzel commented 6 months ago

Thank you for the review.

I added notes about that change to the two files you pointed out. Is the wording ok?

balos1 commented 6 months ago

Thank you for the review.

I added notes about that change to the two files you pointed out. Is the wording ok?

Yes, the wording is OK. Thanks!

mmuetzel commented 6 months ago

Looks like one of the tests failed for the MSVC runner in CI:

      Start 80: test_profiling
80/80 Test #80: test_profiling ....................................***Failed    1.02 sec

Is there any way to get more detailed information from the runner to see what specifically failed?

balos1 commented 6 months ago

@mmuetzel That test randomly fails some times if the VM is slow. I reran it, and it passed.

gardner48 commented 6 months ago

Could this cause problems with older versions of KLU (before 6.0.0) on Windows? I think SuiteSparse_long was defined as long before v6 unless _WIN64 was defined.

balos1 commented 6 months ago

You are correct that prior to 6.0.0 SuiteSparse defined SuiteSparse_long as long unless it was on Windows (see https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/v5.13.0/SuiteSparse_config/SuiteSparse_config.h). I think this could only cause a problem for compatibility with older SuiteSparse versions on platforms (other than Windows) where long is a 32-bit type.

gardner48 commented 6 months ago

I think even on most Windows systems it would use long because it depends on _WIN64 which it looks like is only defined for 64-bit ARM or x64 (see this page).

balos1 commented 6 months ago

I interpret "64-bit ARM or x64" to include "x86-64" so most Windows systems would define _WIN64. Perhaps this interpretation is incorrect?

balos1 commented 6 months ago

If we want to ensure that other LLP64 and ILP32 systems can use SuiteSparse < 6 and 64-bit SUNDIALS_INDEX_SIZE together, then we can add a check of the SuiteSparse version in a preprocessor macro. I actually don't think it would be possible for an ILP32 system to run into the issue because they would probably not be using SUNDIALS_INDEX_SIZE=64, and I don't know of any non Windows LLP64 systems.

gardner48 commented 6 months ago

I interpret "64-bit ARM or x64" to include "x86-64" so most Windows systems would define _WIN64. Perhaps this interpretation is incorrect?

I'm not sure, the _WIN32 lists x86 and x64 so it wasn't clear the me what the right interpretation is.

If we want to ensure that other LLP64 and ILP32 systems can use SuiteSparse < 6 and 64-bit SUNDIALS_INDEX_SIZE together, then we can add a check of the SuiteSparse version in a preprocessor macro. I actually don't think it would be possible for an ILP32 system to run into the issue because they would probably not be using SUNDIALS_INDEX_SIZE=64, and I don't know of any non Windows LLP64 systems.

I think setting SUNDIALS_INDEX_SIZE correctly might be the most common config/build error users run into. If we can, it would be good to give a warning that setup might be problematic.

mmuetzel commented 6 months ago

You are right. Before SuiteSparse version 6.0.0, SuiteSparse_long was long apart from Windows 64-bit where it was __int64 (i.e., long long). That means it was 64-bit wide on (most relevant) platforms with 64-bit pointers and 32-bit wide on (most relevant) platforms with 32-bit pointers. In newer versions of SuiteSparse SuiteSparse_long is int64_t on any platform.

SuiteSparse_long is defined in old and in new versions of SuiteSparse. Maybe, KLU_INDEXTYPE should be defined as SuiteSparse_long? However, converting between a pointer to a 32-bit integer array and a 64-bit integer array cannot be done (meaningfully) with a simple cast. I don't know what SUNSparseMatrix_IndexPointers or SUNSparseMatrix_IndexValues return. But if it is pointers to integer arrays of different types, it would probably be necessary to cast each element of the array to the different integer type instead of the entire array.

Maybe, those pointer casts should be removed entirely to allow the compiler to correctly detect if the pointers that are returned by these function don't match what sun_klu_refactor, sun_klu_factor or the other functions expect.

For easier reference, the code in question is, e.g., here: https://github.com/LLNL/sundials/blob/b685654c6958699821fc68a822b66076fec8cae7/src/sunlinsol/klu/sunlinsol_klu.c#L282-L284 https://github.com/LLNL/sundials/blob/b685654c6958699821fc68a822b66076fec8cae7/src/sunlinsol/klu/sunlinsol_klu.c#L296-L299

balos1 commented 6 months ago

@gardner48 said:

I'm not sure, the _WIN32 lists x86 and x64 so it wasn't clear the me what the right interpretation is.

_WIN32 is defined in both cases. _WIN64 is defined only with 64-bit Windows (which is most systems these days).

@mmuetzel said:

Before SuiteSparse version 6.0.0, SuiteSparse_long was long apart from Windows 64-bit where it was __int64 (i.e., long long). That means it was 64-bit wide on (most relevant) platforms with 64-bit pointers and 32-bit wide on (most relevant) platforms with 32-bit pointers.

This is correct.

@gardner48 said:

I think setting SUNDIALS_INDEX_SIZE correctly might be the most common config/build error users run into. If we can, it would be good to give a warning that setup might be problematic.

I think that the combination of a 32-bit system, latest and greatest SUNDIALS, but a SuiteSparse that is 2 versions old (or more) is very unlikely, but for other TPLs we check that the index sizes used by the TPL and SUNDIALS match in CMake. We could do that and it would also catch the case where users have just misconfigured SUNDIALS or SuiteSparse.

@mmuetzel Would you be able to add a check in the file cmake/tpl/SundialsKLU.cmake like we have for SuperLU_DIST and create a follow on PR to this one?