enthought / mayavi

3D visualization of scientific data in Python
http://docs.enthought.com/mayavi/mayavi/
Other
1.3k stars 284 forks source link

Test suite hangs on macOS #1096

Closed mdickinson closed 2 years ago

mdickinson commented 2 years ago

I'm unable to run the test suite (under nose) on macOS, with the latest master of Mayavi in a venv: the test suite hangs with no output.

Steps to reproduce: here are the commands I'm issuing, in a zsh shell on macOS 10.15.7 (Intel MacBook Pro). Here python3.9 is Python 3.9.7, installed via MacPorts.

python3.9 -m venv --clear ~/.venvs/mayavi
source ~/.venvs/mayavi/bin/activate
python -m pip install -e .
python -m pip install nose
python -m nose.core -v mayavi

Note: the python -m pip install -e . instruction also hangs intermittently (perhaps 1 run out of every 3), but succeeds often enough to allow me to get to the next step.

When the installation succeeds, the test run using nose hangs with no output other than a RuntimeWarning about nose.core being found in sys.modules prior to execution of nose.core. The hang can't be interrupted with a Ctrl-C, and top shows no CPU usage or memory growth in the associated process - IOW the hang resembles a deadlock of some kind. The Python process shows 7 OS-level threads; I'm not sure what's causing the extra threads in something that I'd expect to be single-threaded.

There's at least one obvious problem with my setup, which is that I know that the test suite expects some other packages to be present, and will fail without them. However, this still shouldn't be responsible for hanging the test run - we should have a better failure mode here.

I consider this sort of issue to be high priority to diagnose and fix: it makes development work much harder than it needs to be.

mdickinson commented 2 years ago

Here's a complete transcript of a shell session on my machine:

``` mdickinson@mirzakhani mayavi % git clean -ffxd mdickinson@mirzakhani mayavi % git show commit 7b007d1ee9e75a3ac64439748db7270cbaf4012a (HEAD -> master, origin/master) Merge: 9cc6e257 346b1f03 Author: Mark Dickinson Date: Wed Oct 13 13:47:23 2021 +0100 Merge pull request #964 from tkoyama010/patch-2 ✏️ Add "Getting Started as a Translator" mdickinson@mirzakhani mayavi % git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean mdickinson@mirzakhani mayavi % python3.9 -VV Python 3.9.7 (default, Sep 1 2021, 04:00:27) [Clang 11.0.3 (clang-1103.0.32.62)] mdickinson@mirzakhani mayavi % python3.9 -m venv --clear ~/.venvs/mayavi mdickinson@mirzakhani mayavi % source ~/.venvs/mayavi/bin/activate (mayavi) mdickinson@mirzakhani mayavi % python -m pip install -e . Obtaining file:///Users/mdickinson/Enthought/ETS/mayavi Collecting apptools Using cached apptools-5.1.0-py3-none-any.whl Collecting envisage Using cached envisage-6.0.1-py3-none-any.whl Collecting numpy Using cached numpy-1.21.2-cp39-cp39-macosx_10_9_x86_64.whl (17.0 MB) Collecting pyface>=6.1.1 Using cached pyface-7.3.0-py3-none-any.whl Collecting pygments Using cached Pygments-2.10.0-py3-none-any.whl (1.0 MB) Collecting traits>=6.0.0 Using cached traits-6.3.1-cp39-cp39-macosx_10_9_x86_64.whl (5.0 MB) Collecting traitsui>=7.0.0 Using cached traitsui-7.2.1-py3-none-any.whl Collecting vtk Using cached vtk-9.0.3-cp39-cp39-macosx_10_10_x86_64.whl (47.9 MB) Collecting importlib-resources>=1.1.0 Using cached importlib_resources-5.2.2-py3-none-any.whl (27 kB) Collecting importlib-metadata Using cached importlib_metadata-4.8.1-py3-none-any.whl (17 kB) Collecting zipp>=3.1.0 Using cached zipp-3.6.0-py3-none-any.whl (5.3 kB) Collecting configobj Using cached configobj-5.0.6-py3-none-any.whl Collecting six Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Requirement already satisfied: setuptools in /Users/mdickinson/.venvs/mayavi/lib/python3.9/site-packages (from envisage->mayavi==4.7.4.dev0) (57.4.0) Collecting autobahn>=17.7.1 Using cached autobahn-21.3.1-py2.py3-none-any.whl (495 kB) Collecting wslink>=0.1.3 Using cached wslink-1.0.7-py3-none-any.whl (20 kB) Collecting matplotlib>=2.0.0 Using cached matplotlib-3.4.3-cp39-cp39-macosx_10_9_x86_64.whl (7.2 MB) Collecting Twisted>=17.5.0 Using cached Twisted-21.7.0-py3-none-any.whl (3.1 MB) Collecting cryptography>=3.4.6 Using cached cryptography-35.0.0-cp36-abi3-macosx_10_10_x86_64.whl (2.5 MB) Collecting txaio>=21.2.1 Using cached txaio-21.2.1-py2.py3-none-any.whl (30 kB) Collecting hyperlink>=21.0.0 Using cached hyperlink-21.0.0-py2.py3-none-any.whl (74 kB) Collecting cffi>=1.12 Using cached cffi-1.14.6-cp39-cp39-macosx_10_9_x86_64.whl (176 kB) Collecting pycparser Using cached pycparser-2.20-py2.py3-none-any.whl (112 kB) Collecting idna>=2.5 Using cached idna-3.3-py3-none-any.whl (61 kB) Collecting kiwisolver>=1.0.1 Using cached kiwisolver-1.3.2-cp39-cp39-macosx_10_9_x86_64.whl (61 kB) Collecting pillow>=6.2.0 Using cached Pillow-8.3.2-cp39-cp39-macosx_10_10_x86_64.whl (3.0 MB) Collecting cycler>=0.10 Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB) Collecting python-dateutil>=2.7 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting pyparsing>=2.2.1 Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB) Collecting typing-extensions>=3.6.5 Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB) Collecting Automat>=0.8.0 Using cached Automat-20.2.0-py2.py3-none-any.whl (31 kB) Collecting constantly>=15.1 Using cached constantly-15.1.0-py2.py3-none-any.whl (7.9 kB) Collecting incremental>=21.3.0 Using cached incremental-21.3.0-py2.py3-none-any.whl (15 kB) Collecting attrs>=19.2.0 Using cached attrs-21.2.0-py2.py3-none-any.whl (53 kB) Collecting zope.interface>=4.4.2 Using cached zope.interface-5.4.0-cp39-cp39-macosx_10_14_x86_64.whl (208 kB) Collecting aiohttp Using cached aiohttp-3.7.4.post0-cp39-cp39-macosx_10_14_x86_64.whl (649 kB) Collecting multidict<7.0,>=4.5 Using cached multidict-5.2.0-cp39-cp39-macosx_10_9_x86_64.whl (45 kB) Collecting yarl<2.0,>=1.0 Using cached yarl-1.7.0-cp39-cp39-macosx_10_9_x86_64.whl (121 kB) Collecting async-timeout<4.0,>=3.0 Using cached async_timeout-3.0.1-py3-none-any.whl (8.2 kB) Collecting chardet<5.0,>=2.0 Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB) Installing collected packages: zipp, traits, pycparser, multidict, importlib-resources, importlib-metadata, idna, yarl, typing-extensions, six, pyface, chardet, cffi, attrs, async-timeout, zope.interface, txaio, traitsui, python-dateutil, pyparsing, pillow, numpy, kiwisolver, incremental, hyperlink, cycler, cryptography, constantly, configobj, Automat, aiohttp, wslink, Twisted, matplotlib, autobahn, apptools, vtk, pygments, envisage, mayavi Running setup.py develop for mayavi Successfully installed Automat-20.2.0 Twisted-21.7.0 aiohttp-3.7.4.post0 apptools-5.1.0 async-timeout-3.0.1 attrs-21.2.0 autobahn-21.3.1 cffi-1.14.6 chardet-4.0.0 configobj-5.0.6 constantly-15.1.0 cryptography-35.0.0 cycler-0.10.0 envisage-6.0.1 hyperlink-21.0.0 idna-3.3 importlib-metadata-4.8.1 importlib-resources-5.2.2 incremental-21.3.0 kiwisolver-1.3.2 matplotlib-3.4.3 mayavi-4.7.4.dev0 multidict-5.2.0 numpy-1.21.2 pillow-8.3.2 pycparser-2.20 pyface-7.3.0 pygments-2.10.0 pyparsing-2.4.7 python-dateutil-2.8.2 six-1.16.0 traits-6.3.1 traitsui-7.2.1 txaio-21.2.1 typing-extensions-3.10.0.2 vtk-9.0.3 wslink-1.0.7 yarl-1.7.0 zipp-3.6.0 zope.interface-5.4.0 WARNING: You are using pip version 21.2.3; however, version 21.2.4 is available. You should consider upgrading via the '/Users/mdickinson/.venvs/mayavi/bin/python -m pip install --upgrade pip' command. (mayavi) mdickinson@mirzakhani mayavi % python -m pip install nose Collecting nose Using cached nose-1.3.7-py3-none-any.whl (154 kB) Installing collected packages: nose Successfully installed nose-1.3.7 WARNING: You are using pip version 21.2.3; however, version 21.2.4 is available. You should consider upgrading via the '/Users/mdickinson/.venvs/mayavi/bin/python -m pip install --upgrade pip' command. (mayavi) mdickinson@mirzakhani mayavi % python -m nose.core -v mayavi /opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py:127: RuntimeWarning: 'nose.core' found in sys.modules after import of package 'nose', but prior to execution of 'nose.core'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) ```

The last line of output above was produced at 16:39 local time on my machine. It's now 16:49, and there's been no further output in the last 10 minutes.

Interestingly, the hang in the test suite also seems to be intermittent: just now I was able to get a complete test run. This again would be consistent with some kind of deadlock, and may have the same underlying cause as the hangs we've been seeing at installation time. (e.g., #1049)

rahulporuri commented 2 years ago

I'm not even sure how the installation worked without hanging in the first place - because we know that there's an issue with vtk 9.0.3 that we haven't resolved yet on the default master branch. Is this not an issue on MacOS?

mdickinson commented 2 years ago

Is this not an issue on MacOS?

Sometimes the installation hangs, sometimes it doesn't. On my machine it seems to succeed more often than not.

rahulporuri commented 2 years ago

Sometimes the installation hangs, sometimes it doesn't. On my machine it seems to succeed more often than not.

I didn't know that. On my Windows machine, it has always hanged with VTK 9.0.3 - but by always, I'm talking about 10s of attempts and not 100s so I am not a 100% sure this isn't an intermittent failure on Windows.

rahulporuri commented 2 years ago

Looking at the transcript, it looks like the testsuite hangs before any of the tests are even run - so it's hanging as soon as the first test setup is happening. Is that a correct assumption?

mdickinson commented 2 years ago

I suspect that the hang is happening at test discovery time (so likely at the point of importing something from tvtk / vtk); I need to find time to get more information. I also get hangs if running through python -m unittest discover rather than through nose.

mdickinson commented 2 years ago

I suspect that the hang is happening at test discovery time

Confirmed: I'm using unittest2 (mostly so that I can freely add debugging information without messing with the "system" Python 3.9 install) and frequently (but not always) seeing a hang when trying to load tests from mayavi.tests.test_composite_data.

(mayavi) mdickinson@mirzakhani mayavi % unit2
Doing discovery
Checkpoint 1
Branch 1
Finding tests
_get_module_from_name ... : mayavi.tests.test_array_source
_get_module_from_name done: mayavi.tests.test_array_source
_get_module_from_name ... : mayavi.tests.test_builtin_image
_get_module_from_name done: mayavi.tests.test_builtin_image
_get_module_from_name ... : mayavi.tests.test_builtin_surface
_get_module_from_name done: mayavi.tests.test_builtin_surface
_get_module_from_name ... : mayavi.tests.test_composite_data
mdickinson commented 2 years ago

And simply importing that test module at a prompt hangs (at least, it did the first time I tried it; like everything else here it may well be intermittent):

(mayavi) mdickinson@mirzakhani mayavi % python
Python 3.9.7 (default, Sep  1 2021, 04:00:27) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mayavi.tests.test_composite_data

(No output following the import statement, for several minutes.)

rahulporuri commented 2 years ago

I wonder if it's the TestMLabNullEngine that's causing the problems. Can you try running or importing one of the other tests that use it? See https://github.com/enthought/mayavi/search?q=TestMlabNullEngine

mdickinson commented 2 years ago

I've narrowed it down to from mayavi import mlab.

mdickinson commented 2 years ago

Further narrowed to python -c "import tvtk.tools.tvtk_doc", and in particular the execution of get_tvtk_class_names() in that module.

mdickinson commented 2 years ago

And at this point we're getting into Heisenbug territory: the hang is occurring during the main for name in dir(vtk) loop in get_tvtk_class_names, but if I start adding debug print statements to find out exactly which name it's hanging on, the hang is no longer easy to reproduce, but does still occur occasionally.

I've seen two hangs on vtkDataObject so far.

rahulporuri commented 2 years ago

I was hoping nothing apart from vtkDataEncoder, vtkWebApplication were causing the crashes but alas.

mdickinson commented 2 years ago

Here's an LLDB backtrace from one case of the hang (looks like it was vtkDataEncoder this time). It's hanging on a thread join operation.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6d05f55e libsystem_kernel.dylib`__ulock_wait + 10
    frame #1: 0x00007fff6d1225c2 libsystem_pthread.dylib`_pthread_join + 347
    frame #2: 0x000000010270b8a2 libvtkCommonCore-9.0.dylib`vtkMultiThreader::TerminateThread(int) + 226
    frame #3: 0x0000000102a9d5cc libvtkWebCore-9.0.dylib`vtkDataEncoder::vtkInternals::TerminateAllWorkers() + 124
    frame #4: 0x0000000102a9d67d libvtkWebCore-9.0.dylib`vtkDataEncoder::~vtkDataEncoder() + 29
    frame #5: 0x000000010249ce3d libvtkWrappingPythonCore-9.0.dylib`vtkPythonUtil::RemoveObjectFromMap(_object*) + 157
    frame #6: 0x0000000102481215 libvtkWrappingPythonCore-9.0.dylib`PyVTKObject_Delete + 37
    frame #7: 0x00000001019b0b57 Python`_PyEval_EvalFrameDefault + 2263
    frame #8: 0x0000000101910e1a Python`function_code_fastcall + 97
    frame #9: 0x00000001019b9086 Python`call_function + 403
    frame #10: 0x00000001019b6653 Python`_PyEval_EvalFrameDefault + 25555
    frame #11: 0x00000001019b9be2 Python`_PyEval_EvalCode + 2008
    frame #12: 0x00000001019b01c8 Python`PyEval_EvalCode + 57
    frame #13: 0x00000001019ad501 Python`builtin_exec + 581
    frame #14: 0x0000000101940ae7 Python`cfunction_vectorcall_FASTCALL + 203
    frame #15: 0x00000001019b69b9 Python`_PyEval_EvalFrameDefault + 26425
    frame #16: 0x00000001019b9be2 Python`_PyEval_EvalCode + 2008
    frame #17: 0x0000000101910daa Python`_PyFunction_Vectorcall + 249
    frame #18: 0x00000001019b9086 Python`call_function + 403
    frame #19: 0x00000001019b65b5 Python`_PyEval_EvalFrameDefault + 25397
    frame #20: 0x0000000101910e1a Python`function_code_fastcall + 97
    frame #21: 0x00000001019b9086 Python`call_function + 403
    frame #22: 0x00000001019b6599 Python`_PyEval_EvalFrameDefault + 25369
    frame #23: 0x0000000101910e1a Python`function_code_fastcall + 97
    frame #24: 0x00000001019b9086 Python`call_function + 403
    frame #25: 0x00000001019b6653 Python`_PyEval_EvalFrameDefault + 25555
    frame #26: 0x0000000101910e1a Python`function_code_fastcall + 97
    frame #27: 0x00000001019b9086 Python`call_function + 403
    frame #28: 0x00000001019b6653 Python`_PyEval_EvalFrameDefault + 25555
    frame #29: 0x0000000101910e1a Python`function_code_fastcall + 97
    frame #30: 0x0000000101911e58 Python`object_vacall + 385
    frame #31: 0x0000000101911fbc Python`_PyObject_CallMethodIdObjArgs + 248
    frame #32: 0x00000001019d4ef8 Python`PyImport_ImportModuleLevelObject + 1152
    frame #33: 0x00000001019b4f35 Python`_PyEval_EvalFrameDefault + 19637
    frame #34: 0x00000001019b9be2 Python`_PyEval_EvalCode + 2008
    frame #35: 0x00000001019b01c8 Python`PyEval_EvalCode + 57
    frame #36: 0x00000001019eb059 Python`run_eval_code_obj + 110
    frame #37: 0x00000001019ea230 Python`run_mod + 103
    frame #38: 0x00000001019e9256 Python`PyRun_StringFlags + 182
    frame #39: 0x00000001019e9163 Python`PyRun_SimpleStringFlags + 69
    frame #40: 0x0000000101a00241 Python`Py_RunMain + 425
    frame #41: 0x0000000101a00b36 Python`pymain_main + 360
    frame #42: 0x0000000101a00b8b Python`Py_BytesMain + 42
    frame #43: 0x00007fff6cf1ccc9 libdyld.dylib`start + 1
    frame #44: 0x00007fff6cf1ccc9 libdyld.dylib`start + 1
mdickinson commented 2 years ago

I was hoping nothing apart from vtkDataEncoder ...

That would make sense: it looks as though it's the destructor for vtkDataEncoder that's triggering the hang, and that destructor would be triggered in the Python code when rebinding the c name to the next object in the for list. So I think the apparent hang on vtkDataObject could be explained by problems with vtkDataEncoder.

rahulporuri commented 2 years ago

I guess a temporary fix is to ignore those classes in get_tvtk_class_names - and probably other places in mayavi or tvtk which are dynamically trying to infer and instantiate vtk classes.

mdickinson commented 2 years ago

Yes, I guess so. Seems we need to get one of the existing PRs for hang fixing in.

As a better long-term solution, we might stop to ask why we're doing introspection of (and instantiation of) all the VTK classes at import time. Can we find a better way to do this? For this particular piece of code, it appears that the only reason we're instantiating each class is to try to find out the number of inputs and outputs. Can we get this information in some other way? Do we need to get this information at all? E.g., specifically for Mayavi, would it make sense to explicitly specify the VTK classes we want to use for filters and the like? That would mean having to maintain that list when VTK got updated, but on the other hand it would also mean that what's available in Mayavi would be consistent independent of the VTK version used (with some minimum VTK version required to make sure that everything we need is there).

mdickinson commented 2 years ago

Upstream issue: https://gitlab.kitware.com/vtk/vtk/-/issues/18344

prabhuramachandran commented 2 years ago

@mdickinson -- thanks for all the details, this is much easier for me to follow in terms of reproducing the issue. I will go over this issue carefully and report back.

prabhuramachandran commented 2 years ago

Thank you for this, I can reproduce the lockup. I will work on a fix over the next week. I think in the meanwhile I think can merge #1050. It is long overdue and I can fix the other issues in due course? What do you folks think?

larsoner commented 2 years ago

I think can merge #1050. It is long overdue and I can fix the other issues in due course? What do you folks think?

Sounds good to me. I think #1050 just allows mayavi to install but things still don't quite work (MNE-Python's plotting routines freeze with 9.1.0-pre wheels IIRC), so it's just a first step anyway. I might be able to look more specifically at our failures next week, too. (I have access to Linux and macOS machines so I can do some local testing.)

prabhuramachandran commented 2 years ago

@rahulporuri @mdickinson -- I do not seem to have admin privileges on this repo anymore (at least I cannot see the settings tab anymore) and am unable to merge #1050, could you please help?

mdickinson commented 2 years ago

The upstream issue is reported as fixed, and it looks as though the fix will be in 9.1 (but I'm not sure whether I'm reading the GitLab issue log correctly).