emcconville / wand

The ctypes-based simple ImageMagick binding for Python
http://docs.wand-py.org/
Other
1.42k stars 199 forks source link

Resource life-related crash with thread-based joblib.Parallel #649

Closed WojciechMigda closed 8 months ago

WojciechMigda commented 9 months ago

Hi,

given Ubuntu 18.04/python3.7.5 or FreeBSD 13.2/python 3.10.11, Wand 0.6.13 and simple code below:

#!/usr/bin/env python3

import time
from joblib import Parallel, delayed
from wand.color import Color

def print_color(ix: int, color: Color):
    time.sleep(1)
    print(f"[{ix}] {str(color)}")

def worker(ix: int, color: Color):
    print_color(ix, color)
    time.sleep(1)

def main():

    color = Color('#112233')

    Parallel(
        n_jobs=4, prefer="threads")(
            delayed(print_color)(ix, color)
            for ix in range(16)
        )

    print(str(color))

    pass

if __name__ == '__main__':
    main()

when you run it you will randomly encounter SIGABRT.

Example error outputs are:

python3: ../../wand/pixel-wand.c:709: PixelGetColorAsString: Assertion `wand->signature == WandSignature' failed.
Aborted (core dumped)
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 273, in _wrap_func_call
    return func()
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590, in __call__
    for func, args, kwargs in self.items]
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590, in <listcomp>
    for func, args, kwargs in self.items]
  File "./crash.py", line 10, in print_color
    print(f"[{ix}] {str(color)}")
  File "/<redacted>/.venv/lib/python3.7/site-packages/wand/color.py", line 164, in __str__
    return self.string
  File "/<redacted>/.venv/lib/python3.7/site-packages/wand/color.py", line 672, in string
    with self:
  File "/<redacted>/.venv/lib/python3.7/site-packages/wand/color.py", line 134, in __enter__
    library.PixelSetMagickColor(self.resource, self.raw)
  File "/<redacted>/.venv/lib/python3.7/site-packages/wand/resource.py", line 150, in resource
    raise DestroyedResourceError(repr(self) + ' is destroyed already')
wand.resource.DestroyedResourceError: wand.color.Color('srgb(17,34,51)') is destroyed already
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./crash.py", line 33, in <module>
    main()
  File "./crash.py", line 25, in main
    for ix in range(16)
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1952, in __call__
    return output if self.return_generator else list(output)
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1595, in _get_outputs
    yield from self._retrieve()
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1699, in _retrieve
    self._raise_error_fast()
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 736, in get_result
    return self._return_or_raise()
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _return_or_raise
    raise self._result
wand.resource.DestroyedResourceError: wand.color.Color('srgb(17,34,51)') is destroyed already

With faulthandler instrumentation (python3 -X faulthandler crash.py) the output is as follows:

python3: ../../wand/pixel-wand.c:709: PixelGetColorAsString: Assertion `wand->signature == WandSignature' failed.
Fatal Python error: Aborted

[11] srgb(17,34,51)
Thread 0x00007f63867fc700 (most recent call first):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 470 in _handle_results
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f6386ffd700 (most recent call first):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 422 in _handle_tasks
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f63877fe700 (most recent call first):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 413 in _handle_workers
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f6387fff700 (most recent call first):
  File "crash.py", line 9 in print_color
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in <listcomp>
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in __call__
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 273 in _wrap_func_call
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121 in worker
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f6394d97700 (most recent call first):
  File "crash.py", line 9 in print_color
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in <listcomp>
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in __call__
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 273 in _wrap_func_call
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121 in worker
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f6395598700 (most recent call first):
  File "crash.py", line 9 in print_color
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in <listcomp>
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in __call__
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 273 in _wrap_func_call
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121 in worker
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007f6395d99700 (most recent call first):
  File "/<redacted>/.venv/lib/python3.7/site-packages/wand/color.py", line 674 in string
  File "/<redacted>/.venv/lib/python3.7/site-packages/wand/color.py", line 164 in __str__
  File "crash.py", line 10 in print_color
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in <listcomp>
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 590 in __call__
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 273 in _wrap_func_call
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121 in worker
  File "/usr/lib/python3.7/threading.py", line 870 in run
  File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f639c0f3740 (most recent call first):
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1707 in _retrieve
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1595 in _get_outputs
  File "/<redacted>/.venv/lib/python3.7/site-packages/joblib/parallel.py", line 1952 in __call__
  File "crash.py", line 25 in main
  File "crash.py", line 33 in <module>
Aborted (core dumped)

Disclaimer: maybe this is not a problem with Wand itself but with the way I am using managed resources with Parallel - I will leave the judgement to Wand maintainers. Anyway, you may want to look into this.

emcconville commented 9 months ago

Wand is mostly a contextlib for MagickWand's C-API. The example script is crashing as the threads is only given a memory address to allocated & deallocate -- such that -- a SIGABRT would result when on thread frees at an address that's beeing accessed by another thread.

Try the following...


def main():
  with Color('#112233') as color:
    Parallel(n_jobs=4, prefer='threads')(
      delayed(print_color)(ix, color)
      for ix in range(16))
    print(str(color))

This should work as the with context will prevent any deallcation until after all the parallel threads join.

Also... I'm not familiar with joblib, but there should be a way to define a mutex-lock. I would encourage a mutex on any Wand methods as ImageMagick may already be multi-thread via OpenMP, and multi-threading a multi-thread solution is bad news.