DrTimothyAldenDavis / SuiteSparse

The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.
https://people.engr.tamu.edu/davis/suitesparse.html
Other
1.14k stars 258 forks source link

ABI break in CHOLMOD 5.1.0 without SOVERSION bump #741

Closed svillemot closed 7 months ago

svillemot commented 7 months ago

In commit 0b8b9a7185b26ae7428623f234632cc0fbf13ff7, the field blas_dump was added to the structure cholmod_common. The size of the structure has thus been modified.

This is an ABI break. As a consequence, the SOVERSION of libcholmod should have been bumped (i.e. libcholmod.so.5 should have been renamed libcholmod.so.6).

This unhandled ABI break is causing crashes in various Debian packages, see http://bugs.debian.org/1061049

As a reminder, the SOVERSION should be increased whenever there is a backward-incompatible change in the application binary interface (ABI), i.e. whenever a program compiled against the old version of the library needs to be recompiled in order to run against the new version of the library. Also note that even though in most SuiteSparse libraries the major version number and the SOVERSION are identical, the two are fundamentally different concepts. For example, one has to increase the SOVERSION when a minor and obsolete exported function is removed, but this is arguably not a change significant enough to warrant a major version bump; conversely, adding many new functions can justify a major version bump, but this should not lead to a SOVERSION bump if nothing is broken in the existing ABI.

DrTimothyAldenDavis commented 7 months ago

Oops. My mistake.

What's the least disruptive way to resolve this, in your opinion? I could simply call CHOLMOD v6. It has undergone a lot of changes with the addition of single precision support (but that was what v5 was supposed to indicate).

I could #ifdef out the blas_dump variable, since it's only meant for diagnosing performance issues in the BLAS. It could change to

#ifdef BLAS_DUMP
FILE *blas_dump ; 
#endif

and then BLAS_DUMP would not be used in production, just development. If this is the only ABI-breaking change, then I could call this CHOLMOD 5.1.2.

Which option is least disruptive, do you think?

dkogan commented 7 months ago

Hi. If the ABI change could be un-done (with an #ifdef for instance), that would be the least-churn option. I would have several questions/comments in that case:

DrTimothyAldenDavis commented 7 months ago

yes, turning it on would break the ABI, but the case is not meant to be used in production, just in development. There is no cmake way to set BLAS_DUMP, for example.

Yes, I'd release a 7.5.2 perhaps, with this bug fix. This could be considered a bug introduced in 7.4.x and fixed in 7.5.2, and thus versions 7.4.x through 7.5.1 would still have the bug. If anyone is using 7.4.x to 7.5.1, and they notice the ABI break, they could just fix that bug by upgrading to 7.5.2.

I should have an ABI-checker myself, I suppose. I try to be careful but sometimes I slip up.

DrTimothyAldenDavis commented 7 months ago

turning it on would break the ABI ...

I mean adding the #idef BLAS_DUMP, as the least-churn option.

dkogan commented 7 months ago

Hi. I just ran the automated tool to compare libcholmod in 7.3.1 vs 7.5.1.

I downloaded libcholmod5 and libcholmod5-dbgsym from https://snapshot.debian.org. Then I extracted the contents, and compared the ABIs:

V0=7.3.1+dfsg-2
V1=7.5.1+dfsg-1

mkdir $V0 && for f (*$V0*.deb) { dpkg -x $f $V0 }
mkdir $V1 && for f (*$V1*.deb) { dpkg -x $f $V1 }

abi-dumper $V0/usr/lib/*/*.so.? --search-debuginfo=$V0 -lver $V0 -o $V0.dump
abi-dumper $V1/usr/lib/*/*.so.? --search-debuginfo=$V1 -lver $V1 -o $V1.dump

abi-compliance-checker -l cholmod5 -old $V0.dump -new $V1.dump

The results: https://notes.secretsauce.net/cholmod5-7.3.1+dfsg-2_to_7.5.1+dfsg-1-compat_report.html

So it flagged the issue we're talking about (albeit with a "Low" severity, which I disagree with), and found no other problems. So adding the #ifdef and releasing a 7.5.2 is probably good. Sébastien: does that sound good to you?

svillemot commented 7 months ago

What's the least disruptive way to resolve this, in your opinion? I could simply call CHOLMOD v6. It has undergone a lot of changes with the addition of single precision support (but that was what v5 was supposed to indicate).

Note that my last point was precisely to stress that there is no fundamental reason why bumping the SOVERSION to 6 should imply bumping the major version number of CHOLMOD to 6. Many libraries disconnect the two, because they are distinct concepts. To say it otherwise, one possibility would be to have CHOLMOD 5.1.2 ship libcholmod.so.6.

I could #ifdef out the blas_dump variable, since it's only meant for diagnosing performance issues in the BLAS. It could change to

#ifdef BLAS_DUMP
FILE *blas_dump ; 
#endif

and then BLAS_DUMP would not be used in production, just development. If this is the only ABI-breaking change, then I could call this CHOLMOD 5.1.2.

Which option is least disruptive, do you think?

This option is fine with me, as long as you can ensure that there will never be a reason to define BLAS_DUMP in the Debian package (because, as @dkogan has mentioned, this would reintroduce an ABI breakage without SOVERSION bump). Actually, this solution is the one that is the easiest for me to implement as Debian packager.

DrTimothyAldenDavis commented 7 months ago

This option is fine with me, as long as you can ensure that there will never be a reason to define BLAS_DUMP in the Debian package (because, as @dkogan has mentioned, this would reintroduce an ABI breakage without SOVERSION bump). Actually, this solution is the one that is the easiest for me to implement as Debian packager.

Yes -- that's exactly what I can do. BLAS_DUMP would be for development only, not production. I didn't use an #ifdef because I thought it would be better to have a fixed size struct, but that's not very important.

DrTimothyAldenDavis commented 7 months ago

Thanks for that report. The changes of int mode to int values is upward-compatible with 7.3.1. Some of the functions in 7.3.1 only handled real matrices (not complex), and values=0 was pattern-only, and nonzero was "do the values". If the matrix was complex in 7.3.1, an error code was returned (complex not supported). Now, mode=0 means pattern only (as before), mode=1 means "do the values but not complex conjugate", and mode=2 means "do the values, and use the complex conjugage if complex".

Other changes of mode -> values were just to make this parameter consistent in its name. Usage is unchanged.

The restrict keyword was removed because it causes problems with an older version MS VIsual Studio, if I recall. The behavior of that method is not modified. So this is safe.

The only change is blas_dump, and I can #ifdef that away.

This commit should fix the problem: https://github.com/DrTimothyAldenDavis/SuiteSparse/commit/53f0dc47ad0f3b0b9d665c066672cd6b9f9c3782

I also will need to revise the top-level ChangeLog for Suitesparse 7.5.2.

svillemot commented 7 months ago

Thanks for fixing the issue. I have already uploaded the patch to Debian, without waiting for the new release, because it was somewhat urgent (many packages broken).

DrTimothyAldenDavis commented 7 months ago

Does SuiteSparse 7.6.0.beta1 look OK? I'd like to post it soon since this ABI break is an important thing to fix, but it would be best to get your feedback first.

DrTimothyAldenDavis commented 7 months ago

Fixed in SuiteSparse 7.6.0. Thanks again for catching this.