3-manifolds / Sage_macOS

SageMath as a macOS application bundle.
156 stars 15 forks source link

matplotlib.pyplot hanging in commandline #55

Closed hirani closed 1 year ago

hirani commented 1 year ago

On the commandline in sage, any command in matplotlib.pyplot appear to hang. The minimal test case is: sage: import matplotlib.pyplot as plt sage: plt.plot([0,1],[0,1]) Very quickly the CPU usage maxes out and I have to kill the terminal and force quit the process. This is on an x86 Mac running Ventura 13.3. I have seen a colleague try on his x86 Mac running Ventura and he seems to have no problem. I had no problem using SageMath 9.7 from your project. Any suggestions for debugging the problem? Other packages I have tested seem to be working fine.

Anil

NathanDunfield commented 1 year ago

Correction: I'm the colleague in question, and my machine runs Monterey (12.6.3) not Ventura. So of the four combinations of (Monterey/Ventura) and (SageMath 9.7/9.8) the only one that manifests this issue is Ventura and 9.8. FWIF, matplotlib was updated from 3.5. to 3.6. in Sage 9.8.

culler commented 1 year ago

I think the minimal way to create this hang is to run %matplotlib, which should print out which backend is being used.

My tests show the same behavior with SageMath 9.8 in both Monterey and Ventura. Namely, the magic command above hangs. Sampling the hung process indicates that it is blocked in PyThread_acquire_lock_timed. I believe that Apple uses a busy wait for a lock which is expected to be released quickly, to avoid context switches. That might explain why the CPU usage goes to 100%.

I have not yet been able to figure out what actual python code gets run by that magic command.

Also, in plain ipython the magic command returns as expected.

NathanDunfield commented 1 year ago

Ok, this is weird: I have two Monterey 12.6.3 systems, both Intel, both with SageMath 9.8 on them. Matplotlib works just fine on the slightly older iMac Pro (Xeon processor, 2018) but not on the MacBook Pro (16-inch, i9, 2019).

culler commented 1 year ago

That is mostly consistent. My Monterey tests were on a macbook air from 2019, although it is just a core-i5.

So, what could possibly explain this? The sage build targets a very old intel processor.

culler commented 1 year ago

Here is a clue. The following hangs in SageMath 9.8:

sage: from matplotlib import get_backend sage: get_backend()

culler commented 1 year ago

Running gui tk first seems to avoid the hang, but does not produce any graphics:

sage: gui tk
sage: import matplotlib
sage: matplotlib.use('TkAgg')
sage: from matplotlib import get_backend
sage: get_backend()
'TkAgg'
sage: import matplotlib.pyplot as plt
sage: plt.plot([0,1],[0,1])
[<matplotlib.lines.Line2D object at 0x15dd38790>]

For a standard matplotlib install the default backend is osx = macOS. But Sage does not include that backend.

culler commented 1 year ago

Actually, the gui tk is not needed to avoid the hang. So it does not look like a missing event loop is the cause.

NathanDunfield commented 1 year ago

I see the macOS backend both my installs:

/private/var/tmp/sage-9.8-current/local/var/lib/sage/venv-python3.11.1/lib/python3.11/site-packages/matplotlib/backends/_macosx.cpython-311-darwin.so

and both Anil and I routinely use that backend (I'm pretty sure).

culler commented 1 year ago

This is why I thought it was not provided:

sage: import matplotlib
sage: matplotlib.use('macOS')

[ ... ]

ValueError: 'macos' is not a valid value for backend; supported values are ['GTK3Agg', 'GTK3Cairo', 
'GTK4Agg',  'GTK4Cairo', 'MacOSX', 'nbAgg', 'QtAgg', 'QtCairo', 'Qt5Agg', 'Qt5Cairo', 'TkAgg',
 'TkCairo', 'WebAgg', 'WX',  'WXAgg', 'WXCairo', 'agg', 'cairo', 'pdf', 'pgf', 'ps', 'svg', 'template']

(Similarly with 'osx'.)

culler commented 1 year ago

Oops. So I was supposed to call it 'MacOSX'. When I try that, though, it hangs without showing a plot.

So maybe the problem is with loading that module. (I don't know why the TkAgg does not even try to open a window.)

culler commented 1 year ago

I am able to produce a plot with SageMath 9.8 like this:

$ sage -python
Python 3.11.1 (main, Jan 30 2023, 15:05:49) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib.pyplot as plt
>>> plt.plot([0,1],[0,1])
[<matplotlib.lines.Line2D object at 0x1038af810>]
>>> plt.show()

But doing that in sage, or in sage -ipthonhangs as reported.

The following also works:

$ sage -python
Python 3.11.1 (main, Jan 30 2023, 15:05:49) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib.use('TkAgg')
>>> import matplotlib.pyplot as plt
>>> plt.plot([0,1],[0,1])
[<matplotlib.lines.Line2D object at 0x106409d10>]
>>> plt.show()

So I think this is an interaction between matplotlib and ipython. Sage is probably not involved. Sagemath 9.7 uses IPython 8.4.0 and matplotlib 3.5.2 while SageMath 9.8 uses IPython 8.6.0 and matplotlib 3.6.2. Or possibly it is purely an IPython issue. I can try downgrading IPython to 8.4.0 in my build of Sage 9.8 and see what happens.

culler commented 1 year ago

On the other hand, with iPython 8.6.0 and matplotlib 3.6.2 installed for Python3.11 on Ventura, the hang does not occur and the plots do appear. So Sage is involved somehow.

culler commented 1 year ago

I have found a workaround for this. The hang seems to occur when matplotlib attempts to import the MacOSX backend. In 9.7 there was no MacOSX backend, and the TkAgg backend was used as the default. If I remove _macosx.cpython-311-darwin.so and backend_macosx.py from the matplotlib backends directory then it does not try to import the MacOSX backend, so there is no hang, and it makes the TkAgg backend be the default. That backend works, although if you want to be able to close the plot window you need to run gui tk.

culler commented 1 year ago

With the release of SageMath 10.0 just around the corner, I am not sure whether I want to do another release of SageMath 9.8 which includes the workaround above. (And I would much prefer knowing why the hang occurs with the MacOSX backend in Sage, but not in plain ipython). So I would like to record here how to make plotting work without making any changes to SageMath 9.8. Do it like this:

sage: gui tk sage: import matplotlib sage: matplotlib.use('tkagg') sage: from matplotlib import pyplot as plt sage: plt.ion() <contextlib.ExitStack object at 0x1659ed610> sage: plt.plot([0,1],[0,1]) [<matplotlib.lines.Line2D object at 0x165bc0550>]

A plot will immediately appear (because of the call to plt.ion()) and the window will be closable in the usual way (because of running gui tk).

hirani commented 1 year ago

I tested this on the machine on which I had stumbled across the problem. Now with your suggested solution plotting works as you say. Thanks for looking into this and finding a way to make it work.

culler commented 1 year ago

I am reopening this so it will be visible. I have both a better workaround and a partial explanation. First the workaround. Just run %pip install --upgrade matplotlib at the sage prompt. That will install matplotlib 3.7 in your .sage directory and matplotlib will work.

Now the explanation. This has nothing to do with matplotlib 3.7 vs 3.6. In fact, it has nothing to do with anything. The problem is that Apple has a bug which can lead to a deadlock (i.e. two threads, each waiting for the other to release a lock) when using dlopen to read symbols from a shared library (as matplotlib does with its backends). Apple has created a race condition which may or may not lead to a deadlock. We saw this in Nathan's "Ok, this is weird" report and also in the fact that matplotlib works with sage -python but not with sage -ipython. Random changes in the environment can cause the deadlock to appear or disappear. Such is the nature of race conditions. Either horse may win.

Here is a report on the Apple Developer site in which Apple acknowledges the bug (amazingly). Here is a report of a variant of the problem. Incidentally, the high CPU usage is another sign that there is a deadlock. Apple uses a spinlock in these situations, to avoid context switches during what is expected to be a very short wait. When a spinlock blocks it consumes 100% of a CPU.

culler commented 1 year ago

I have learned some more about this. The hang only gets triggered if the extension module _macosx.cpython-311-darwin.so has been signed. I did the following experiment:

Note that when matplotlib is installed with %pip it will not be signed, which explains why it works in that case. In fact there is no issue with loading unsigned extension modules. The gatekeeper is fine with that. Apple even states that shared libraries opened with dlopen do not need to be signed. They are treated as data files. But Apple's notarizer refuses to notarize an app containing an unsigned .so file.

culler commented 1 year ago

Drilling down deeper has revealed that this problem has nothing to do with matplotlib. Here is another way to create this hang: sage: gui osx This is supposed to activate the "osx" iPython event loop, and matplotlib does the equivalent thing in its get_backend() function when the default backend is "macosx". The osx event loop is provided primarily in order to support the macosx backend for matplotlib.

So the bottom line here seems to be that the "osx" event loop cannot be started from a signed extension module. Maybe the docstring of ipykernel/_eventloop_macos.py provides some insight:

"""Eventloop hook for OS X

Calls NSApp / CoreFoundation APIs via ctypes.
"""
culler commented 1 year ago

It turns out that a similar hang occurs in plain ipython when you run %gui osx. I have reported that as an ipython issue: ipython #14072.

culler commented 1 year ago

SageMath-10.0 version v.2.0.1 fixes both of these issues. The most important takeaway is that a signed application must must be signed with the com.apple.security.cs.allow-unsigned-executable-memory entitlement if it uses the Python ctypes module to call C functions.

I think the issue can now be closed.