3-manifolds / Sage_macOS

SageMath as a macOS application bundle.
152 stars 15 forks source link

Segfault for char polys of matrices over cyclotomic fields #43

Closed NathanDunfield closed 2 years ago

NathanDunfield commented 2 years ago

With Sage 9.6 on macOS 11.6.5, I get:

sage: K = CyclotomicField(5)
sage: A = matrix(K, [[1, 2], [3, 4]])
sage: A.charpoly()
[...]
Unhandled SIGSEGV: A segmentation fault occurred.

This problem seems specific to CyclotomicFields and does not occur for QuadraticFields or even NumberFields that are obviously isomorphic to aCyclotomicField:

sage: L = NumberField(x^4 + x^3 + x^2 + x + 1, 'a')
sage: A = matrix(L, [[1, 2], [3, 4]])
sage: A.charpoly()
x^2 - 5*x - 2

This problem does not occur with Sage 9.6 on Linux, but does manifest in Sage 9.5 on macOS.

culler commented 2 years ago

Well this is definitely a bug, but maybe not a Sage_macOS bug.

I verified that the same segfault occurs when doing this computation in Sage 9.6, as compiled directly from the source distribution. (To build the Sage_macOS package, Sage is first compiled with the standard build procedure. So I happen to have a copy of that build lying around.) As such, I think it is a Sage bug, independent of anything we do to package the binary distribution of Sage. So I think this report probably belongs on the Sage TRAC site.

Another bug which is demonstrated here is that the CySignals package does not work correctly on macOS. Instead of opening Sage in gdb, as it does on linux, it simply exits after trapping the SIGSEGV signal, making it extremely difficult to get a stack trace when trying to debug a crash. I don't know if there is a TRAC ticket open for that.

NathanDunfield commented 2 years ago

I have reported this on the main Sage bug site. Should we close this ticket or keep it open?

culler commented 2 years ago

I think we could leave it open and see what happens "upstream".

On Tue, Jul 5, 2022 at 3:51 PM Nathan Dunfield @.***> wrote:

I have reported this https://trac.sagemath.org/ticket/34117 on the main Sage bug site. Should we close this ticket or keep it open?

— Reply to this email directly, view it on GitHub https://github.com/3-manifolds/Sage_macOS/issues/43#issuecomment-1175485994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP42MKLFCUG5KMIZNL3VSSN6TANCNFSM52XE6N5Q . You are receiving this because you commented.Message ID: @.***>

culler commented 2 years ago

Bad news. I built a SageMath_9_7.app using 9.7.beta3 and I see the same crash. But Dave Witte-Morris reported on the Sage TRAC that this computation worked correctly for him with his build of 9.7.beta3. Of course, no two builds of Sage are the same. Who knows which Homebrew or other external libraries were used in Dave's build. Our build carefully avoids using any external libraries (on Intel - on arm64 we are forced to copy the Homebrew gfortran library into our app bundle and fix load paths since the Sage gfortran build fails). We could try to find out from Dave what his build contains. But I am thinking that the next step might be to patch CySignals so we can get a crash log when a segfault occurs. Frankly, I see no reason why sage should catch the SIG_SEGV signal on macOS, given that the handler is unable to do anything useful and instead all it does is to prevent normal crash logs from being generated. I think that might be a relatively simple patch to CySignals.

culler commented 2 years ago

I have found the problem, and it is a Sage_macOS bug after all. For some reason, probably having to do with the build process for primecount, the structure of the load paths in the primecount libraries and executable do not follow the same pattern as most of the Sage libraries, and the script I use to rewrite load paths in order to make everything relocatable does not handle the primecount files correctly.

Why this gets reported as a segfault instead of a load error is yet another Apple mystery.

I am working on this now.

culler commented 2 years ago

Interestingly, this load path problem is reported on arm64 with a SIGILL instead of SIGSEGV.

culler commented 2 years ago

This is more subtle than I thought. The rpaths were definitely wrong in the 9.6 release, but correcting them was not the whole story. I have a version of the primecount library working, but I have seen the segfault in versions that had the identical rpath structure to the working one.

culler commented 2 years ago

I am recording some details about this bug.

With exactly two exceptions, the executables in sage/local/bin and the shared libraries in sage/local/lib set their mach-O LC_LOAD_DYLIB commands to an absolute path to a library. The two exceptions are bin/primecount and lib/libprimecount.X.Y.dylib. In those two cases the LC_LOAD_DYLIB paths are set to @rpath/libXXX and there is an LC_RPATH load command setting the rpath to the absolute path to the lib directory. Also, as is typical for libraries built by sage, there are multiple copies of that LC_RPATH load command.

Our relocation script removes all rpaths, makes all LC_LOAD_DYLIB commands use @rpath, and adds an LC_RPATH command which sets the rpath to a relative path which begins with @loader_path or @executable_path as appropriate. This script has now been updated so it checks for @rpath in the LC_LOAD_DYLIB commands and handles it correctly. The previous version did not do that, and hence it screwed up for exactly those two primecount files.

culler commented 2 years ago

This is fixed in release 1.4.2. Also, with the updated build scripts and the patches to update Sage's primecount spkg to a newer version, I was able to build 9.7.beta3 with no problems, and its primecount worked correctly.

So, I am closing this ticket.