JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
800 stars 64 forks source link

Threaded test succeeds on `pytest` and fails on `tox` #539

Open kshyatt-aws opened 2 months ago

kshyatt-aws commented 2 months ago

Affects: JuliaCall

Describe the bug When I run the following MWE, the code succeeds when run under pytest but hangs forever under tox, presumably because pytest is launching the test from the main thread? I have JULIA_NUM_THREADS set to auto.

from juliacall import Main as jl
import pytest
from concurrent.futures import ThreadPoolExecutor, as_completed, wait

def test_sample():
    jl.seval("""
    function my_sqr()
        a = rand(20, 20) 
        Threads.@threads for ii in 1:size(a, 1)
            for jj in 1:size(a, 2)
                a[ii,jj] = a[ii,jj]^2
            end
        end
        return
    end
    """)

    pool = ThreadPoolExecutor(2)

    fs = {pool.submit(jl.my_sqr._jl_call_nogil): ix for ix in range(10)}
    for future in as_completed(fs):
        rank = fs[future]
        results = future.result()
        print(results)

If I run bs = [my_sqr() for ix in range(10)]

instead in the test, everything works fine in both tox and pytest.

Your system This is on PythonCall/juliacall 0.9.22 and Julia 1.10.4, Python 3.10.14, Mac M2.

Additional context Add any other context about the problem here.

kshyatt-aws commented 2 months ago

Could this possibly be https://github.com/tox-dev/tox/issues/3254 ?

dpinol commented 2 months ago

Did you try moving the 'import juliacall' within the test method? This has solved lots of problems for me with multiprocess

kshyatt-aws commented 2 months ago

I'll give that a spin, thanks for the tip!

kshyatt-aws commented 2 months ago

No joy 😭

lassepe commented 2 months ago

Here's a smaller reproducer that involves neither tox nor pytest and no Threads.@threads on the Julia side. Any printing causes the program to get stuck. Looks like a deadlock to me.

from concurrent.futures import ThreadPoolExecutor, as_completed
from juliacall import Main as jl

jl.seval(
    """
function my_sqr()
    println("Julia get's stuck at this print statement issued from thread $(Threads.threadid())")
    a = rand(10)
    for ii in 1:size(a, 1)
        a[ii] = a[ii]^2
    end
    return sum(a)
end
"""
)

pool = ThreadPoolExecutor(2)

fs = {pool.submit(jl.my_sqr): ix for ix in range(10)}
for future in as_completed(fs):
    print("running")
    rank = fs[future]
    results = future.result()
    print("done")
    print(results)
kshyatt-aws commented 2 months ago

Just for reproducibility, here is my tox.ini:

[tox]
envlist = unit-tests

[testenv:unit-tests]
basepython = python3
allowlist_externals =
    pytest
commands =
    pytest {posargs}
extras = test

and my pyproject.toml deps list:

[project.optional-dependencies]
test = [
    "black",
    "flake8",
    "flake8-rst-docstrings",
    "isort",
    "pre-commit",
    "pylint",
    "pytest==7.1.2",
    "pytest-benchmark",
    "pytest-cov",
    "pytest-rerunfailures",
    "pytest-timeout",
    "pytest-xdist",
    "sphinx",
    "sphinx-rtd-theme",
    "sphinxcontrib-apidoc",
    "tox"
]
[tool.setuptools.dynamic]                                                                                  
dependencies = {file = "requirements.txt"}

and requirements.txt:

juliacall==0.9.22
numpy
ericphanson commented 2 months ago

TLDR: one workaround might be to just call jl.seval("ccall(:jl_enter_threaded_region, Cvoid, ())") before any threaded code (and maybe call the exit version, jl.seval("ccall(:jl_exit_threaded_region, Cvoid, ())"), afterwards to re-"prioritize IO over threading").


I put the code from https://github.com/JuliaPy/PythonCall.jl/issues/539#issuecomment-2293753477 into script.py and ran it under lldb as:

lldb python script.py
(lldb) target create "python"
Current executable set to '/Users/eph/.pyenv/versions/3.9.1/bin/python' (arm64).
(lldb) settings set -- target.run-args  "script.py"
(lldb) r
Process 91437 launched: '/Users/eph/.pyenv/versions/3.9.1/bin/python' (arm64)
warning: (arm64) /Users/eph/.julia/compiled/v1.10/UnsafePointers/FMCLb_VVXwr.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/Pidfile/wlmRx_VVXwr.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/Scratch/ICI1U_elg9D.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/compiled/v1.10/LazyArtifacts/MRP8l_RLQSU.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/DataValueInterfaces/9Lpkp_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/DataAPI/3a8mN_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/IteratorInterfaceExtensions/N0h8q_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/TableTraits/I6SaN_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
/Users/eph/PythonCall.jl/pysrc/juliacall/__init__.py:247: UserWarning: Julia was started with multiple threads but multithreading support is experimental in JuliaCall. It is recommended to restart Python with the environment variable PYTHON_JULIACALL_HANDLE_SIGNALS=yes set, otherwise you may experience segfaults or other crashes. Note however that this interferes with Python's own signal handling, so for example Ctrl-C will not raise KeyboardInterrupt. See https://juliapy.github.io/PythonCall.jl/stable/faq/#Is-PythonCall/JuliaCall-thread-safe? for further information. You can suppress this warning by setting PYTHON_JULIACALL_HANDLE_SIGNALS=no.
  warnings.warn(
Julia get's stuck at this print statement issued from thread 6
Process 91437 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
libsystem_kernel.dylib`:
->  0x18db599ec <+8>:  b.lo   0x18db59a0c               ; <+40>
    0x18db599f0 <+12>: pacibsp 
    0x18db599f4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x18db599f8 <+20>: mov    x29, sp
Target 0: (python) stopped.
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x000000010010577c python`take_gil + 176
    frame #3: 0x0000000100105e50 python`PyEval_RestoreThread + 24
    frame #4: 0x00000001001a6fc0 python`acquire_timed + 132
    frame #5: 0x00000001001a6c94 python`lock_PyThread_acquire_lock + 56
    frame #6: 0x00000001000451b4 python`method_vectorcall_VARARGS_KEYWORDS + 248
    frame #7: 0x000000010010e834 python`call_function + 416
    frame #8: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #9: 0x000000010010f784 python`_PyEval_EvalCode + 3032
    frame #10: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
    frame #11: 0x000000010010e834 python`call_function + 416
    frame #12: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #13: 0x000000010010f784 python`_PyEval_EvalCode + 3032
    frame #14: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
    frame #15: 0x000000010010e834 python`call_function + 416
    frame #16: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #17: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #18: 0x000000010010e834 python`call_function + 416
    frame #19: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #20: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #21: 0x000000010010e834 python`call_function + 416
    frame #22: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #23: 0x000000010010f784 python`_PyEval_EvalCode + 3032
    frame #24: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
    frame #25: 0x000000010010e834 python`call_function + 416
    frame #26: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #27: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #28: 0x000000010010e834 python`call_function + 416
    frame #29: 0x000000010010c228 python`_PyEval_EvalFrameDefault + 23448
    frame #30: 0x000000010010f784 python`_PyEval_EvalCode + 3032
    frame #31: 0x00000001001065c8 python`PyEval_EvalCode + 80
    frame #32: 0x000000010014c618 python`PyRun_FileExFlags + 316
    frame #33: 0x000000010014b93c python`PyRun_SimpleFileExFlags + 248
    frame #34: 0x0000000100168ae8 python`Py_RunMain + 1708
    frame #35: 0x0000000100168fb0 python`pymain_main + 340
    frame #36: 0x000000010016902c python`Py_BytesMain + 40
    frame #37: 0x000000018d80e0e0 dyld`start + 2360
  thread #2
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x0000000155051fa0, mutex=0x0000000155051f60) at thread.c:883:7
    frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x000000010be70030, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
    frame #4: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
    frame #5: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
    frame #6: 0x000000011a1f92d8 sys.dylib`julia_YY.waitYY.645_74778.4 at condition.jl:130
    frame #7: 0x00000001042b8148
    frame #8: 0x00000001015fab08 libjulia-internal.1.10.4.dylib`start_task [inlined] jl_apply(args=<unavailable>, nargs=1) at julia.h:1982:12 [opt]
    frame #9: 0x00000001015faafc libjulia-internal.1.10.4.dylib`start_task at task.c:1238:19 [opt]
  thread #3
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x000000012600b5a0, mutex=0x000000012600b560) at thread.c:883:7
    frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x000000010be78030, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
    frame #4: 0x00000001197e7a28 sys.dylib`jlplt_ijl_task_get_next_75485.3 + 104
    frame #5: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
    frame #6: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
    frame #7: 0x000000011aa25100 sys.dylib`julia_task_done_hook_75391.3 at task.jl:675
    frame #8: 0x0000000119eb0d04 sys.dylib`jfptr_task_done_hook_75392.3 + 56
    frame #9: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011cdb3180, args=0x0000000170e0af58, nargs=1, mfunc=0x000000011cdb3020, world=<unavailable>) at gf.c:0 [opt]
    frame #10: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011cdb3180, args=0x0000000170e0af58, nargs=<unavailable>) at gf.c:3077:12 [opt]
    frame #11: 0x00000001015f9a08 libjulia-internal.1.10.4.dylib`jl_finish_task [inlined] jl_apply(args=<unavailable>, nargs=2) at julia.h:1982:12 [opt]
    frame #12: 0x00000001015f9a00 libjulia-internal.1.10.4.dylib`jl_finish_task(t=0x000000010be80010) at task.c:320:13 [opt]
    frame #13: 0x00000001016228e8 libjulia-internal.1.10.4.dylib`jl_threadfun(arg=0x00000001546369e0) at partr.c:199:5 [opt]
    frame #14: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #4
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x000000014481bda0, mutex=0x000000014481bd60) at thread.c:883:7
    frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x000000010be68030, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
    frame #4: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
    frame #5: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
    frame #6: 0x000000011aa25100 sys.dylib`julia_task_done_hook_75391.3 at task.jl:675
    frame #7: 0x0000000119eb0d04 sys.dylib`jfptr_task_done_hook_75392.3 + 56
    frame #8: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011cdb3180, args=0x0000000171612f58, nargs=1, mfunc=0x000000011cdb3020, world=<unavailable>) at gf.c:0 [opt]
    frame #9: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011cdb3180, args=0x0000000171612f58, nargs=<unavailable>) at gf.c:3077:12 [opt]
    frame #10: 0x00000001015f9a08 libjulia-internal.1.10.4.dylib`jl_finish_task [inlined] jl_apply(args=<unavailable>, nargs=2) at julia.h:1982:12 [opt]
    frame #11: 0x00000001015f9a00 libjulia-internal.1.10.4.dylib`jl_finish_task(t=0x000000010be8c010) at task.c:320:13 [opt]
    frame #12: 0x00000001016228e8 libjulia-internal.1.10.4.dylib`jl_threadfun(arg=0x0000000154636a00) at partr.c:199:5 [opt]
    frame #13: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #5
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x00000001019561d0, mutex=0x0000000101956190) at thread.c:883:7
    frame #3: 0x000000010162279c libjulia-internal.1.10.4.dylib`jl_gc_mark_threadfun(arg=<unavailable>) at partr.c:139:13 [opt]
    frame #4: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #6
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x000000010d67f804 libopenblas64_.dylib`blas_thread_server + 388
    frame #3: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #7
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x000000010d67f804 libopenblas64_.dylib`blas_thread_server + 388
    frame #3: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #8
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x000000010d67f804 libopenblas64_.dylib`blas_thread_server + 388
    frame #3: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #9
    frame #0: 0x000000018db57ea4 libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #11
    frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x000000014497eda0, mutex=0x000000014497ed60) at thread.c:883:7
    frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x0000000108690e10, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
    frame #4: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
    frame #5: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
    frame #6: 0x000000011b200a3c sys.dylib`julia_uv_write_76288.4 at stream.jl:1048
    frame #7: 0x000000011b201a34 sys.dylib`julia_unsafe_write_50945.4 at stream.jl:1120
    frame #8: 0x0000000118f2ad58 sys.dylib`japi1_print_49291.4 at io.jl:248
    frame #9: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011e14e750, args=0x00000001730e16e8, nargs=3, mfunc=0x000000011f01f910, world=<unavailable>) at gf.c:0 [opt]
    frame #10: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011e14e750, args=0x00000001730e16e8, nargs=<unavailable>) at gf.c:3077:12 [opt]
    frame #11: 0x00000001015e9ae4 libjulia-internal.1.10.4.dylib`do_apply [inlined] jl_apply(args=<unavailable>, nargs=4) at julia.h:1982:12 [opt]
    frame #12: 0x00000001015e9ad4 libjulia-internal.1.10.4.dylib`do_apply(args=0x00000001730e1810, nargs=<unavailable>, iterate=0x000000011e09da00) at builtins.c:768:26 [opt]
    frame #13: 0x0000000119e45764 sys.dylib`japi1_println_50066.3 at io.jl:75
    frame #14: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011c0c6f90, args=0x00000001730e1950, nargs=2, mfunc=0x000000011c0c7560, world=<unavailable>) at gf.c:0 [opt]
    frame #15: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011c0c6f90, args=0x00000001730e1950, nargs=<unavailable>) at gf.c:3077:12 [opt]
    frame #16: 0x000000011ab24660 sys.dylib`julia_println_50057.3 at coreio.jl:4
    frame #17: 0x000000010c5ac094
    frame #18: 0x000000011543996c WdXsa_xycUF.dylib`julia__pyjl_callmethod_8988 at base.jl:73
    frame #19: 0x000000011543a0a4 WdXsa_xycUF.dylib`julia__pyjl_callmethod_8981 at C.jl:63
    frame #20: 0x000000011543a1b0 WdXsa_xycUF.dylib`jfptr__pyjl_callmethod_8982 + 100
    frame #21: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x0000000115613270, args=0x00000001730e1ec0, nargs=2, mfunc=0x00000001157368d0, world=<unavailable>) at gf.c:0 [opt]
    frame #22: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x0000000115613270, args=0x00000001730e1ec0, nargs=<unavailable>) at gf.c:3077:12 [opt]
    frame #23: 0x000000011548e594 WdXsa_xycUF.dylib`jlcapi__pyjl_callmethod_9066 + 228
    frame #24: 0x000000010007dcdc python`cfunction_call + 168
    frame #25: 0x000000010003c7bc python`_PyObject_MakeTpCall + 360
    frame #26: 0x000000010010e894 python`call_function + 512
    frame #27: 0x000000010010c1ac python`_PyEval_EvalFrameDefault + 23324
    frame #28: 0x000000010010f784 python`_PyEval_EvalCode + 3032
    frame #29: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
    frame #30: 0x000000010003c5e4 python`_PyObject_FastCallDictTstate + 272
    frame #31: 0x000000010003d2bc python`_PyObject_Call_Prepend + 148
    frame #32: 0x000000010009c744 python`slot_tp_call + 224
    frame #33: 0x000000010003cd64 python`_PyObject_Call + 172
    frame #34: 0x000000010010c4a4 python`_PyEval_EvalFrameDefault + 24084
    frame #35: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #36: 0x000000010010e834 python`call_function + 416
    frame #37: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #38: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #39: 0x000000010010c4a4 python`_PyEval_EvalFrameDefault + 24084
    frame #40: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #41: 0x000000010010e834 python`call_function + 416
    frame #42: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #43: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #44: 0x000000010010e834 python`call_function + 416
    frame #45: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
    frame #46: 0x000000010003cfb4 python`function_code_fastcall + 112
    frame #47: 0x000000010003efb0 python`method_vectorcall + 284
    frame #48: 0x00000001001a7b34 python`t_bootstrap + 72
    frame #49: 0x0000000100159288 python`pythread_wrapper + 28
    frame #50: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136

The last thread, thread 11 here looks like the one printing. The last line in the Julia code seems to be this line where it does a ccall whose source is here where there is this comment. I don’t understand it entirely since I’m not quite sure what a _threadedregion is, but maybe some invariant the Julia scheduler is expecting is not being maintained with the python threads.

Looking here, it says

  • jl_...() API functions may only be called from the thread in which jl_init() was called, or from threads started by the Julia runtime. Calling Julia API functions from user-started threads is not supported, and may lead to undefined behaviour and crashes.

That seems to be the case here, where we are calling into the Julia runtime from user-started threads (started by python). However I wonder if that documentation is out of date, since here foreign threads are supposed to be able to call Julia code via jl_adopt_thread (I guess the missing docs is Test (and update documentation) for foreign threads functionality · Issue #47627 · JuliaLang/julia). However PythonCall does use @ccallable which is supposed to automatically do the jl_adopt_thread stuff (looking at Extreme Multi-Threading: C++ and Julia 1.9 Integration). That blog post also mentions:

  • jl_enter_threaded_region sets Julia to multi-threading mode, I believe. This function is also used for example by the Julia @threads macro, but lacks any documentation.

which seems like the same threaded region stuff mentioned in the comment near the hang. If I change the last lines of the script to

jl.seval("ccall(:jl_enter_threaded_region, Cvoid, ())")
fs = {pool.submit(jl.my_sqr): ix for ix in range(10)}
for future in as_completed(fs):
    print("running")
    rank = fs[future]
    results = future.result()
    print("done")
    print(results)
jl.seval("ccall(:jl_exit_threaded_region, Cvoid, ())")

where I’ve just added the ccalls, then the code does not hang!

In the blog post they seem to call jl_enter_threaded_region during initialization and never exit, I wonder if something like that is feasible for PythonCall. The source code comment on the _threaded_region object itself just says

_Atomic(unsigned) _threadedregion; // keep track of whether to prioritize IO or threading

so I wonder what the consequences of prioritizing threading over IO are. Anyway it seems like this might be useful as a quick workaround, and maybe PythonCall itself should be calling jl_enter_threaded_region?

kshyatt-aws commented 2 months ago

@ericphanson does this resolve the MWE I showed at the top as well? Or "only" the printing one (which is still awesome)?

lassepe commented 2 months ago

It fixes the printing example but not yours. Let me split my comment into a separate issue

ericphanson commented 2 months ago

Ah sorry, I didn’t try yours since I didn’t know how to run tox so I went for the easier one. I’ll try that one later but I’ve had issues getting lldb to run under pytest before so I’m not sure if I’ll get it working with tox.

lassepe commented 2 months ago

I just stumbled across this section of the docs: https://juliapy.github.io/PythonCall.jl/dev/juliacall/#Caveat:-Julia's-task-scheduler TL;DR: currently, if a task ever yields to the Julia scheduler, it will not be resumed automatically; one has to yield to the Julia scheduler explicitly from Python to resume those tasks.

In fact the following version of the og tox example does not hang:

from juliacall import Main as jl
from concurrent.futures import ThreadPoolExecutor, as_completed, wait

jl_yield = getattr(jl, "yield")

def test_sample():
    jl.seval(
        """
    function my_sqr()
        a = rand(20, 20)
        Threads.@threads for ii in 1:size(a, 1)
            for jj in 1:size(a, 2)
                a[ii,jj] = a[ii,jj]^2
            end
        end
        return
    end
    """
    )

    pool = ThreadPoolExecutor(2)

    fs = {pool.submit(jl.my_sqr._jl_call_nogil): ix for ix in range(10)}
    jl_yield() # yielding once so that we start iterating below
    for future in as_completed(fs):
        jl_yield() # yielding again in case there are any waiting Julia tasks
        rank = fs[future]
        results = future.result()
        print(results)
kshyatt-aws commented 2 months ago

I think you can set the commands = part of tox to lldb similar to what you did for pytest, then run tox -e unit-tests to just invoke that.

kshyatt-aws commented 2 months ago

OK, was able to repro the tox success above! I'm even able to have it working if I do:

def py_sqr():
        jl.my_sqr._jl_call_nogil()
        return

    pool = ThreadPoolExecutor(10)
    fs = {pool.submit(py_sqr): ix for ix in range(10)}

But not if I don't control the python code which is submitting the tasks and collecting them (e.g. if I'm using some external Python package which has a do_stuff() function which is submitting to the pool and collecting). Is there any workaround in that situation?

lassepe commented 2 months ago

I wonder if one could have a python task that just wakes up from time to time to yield to the Julia scheduler. I'm not sufficiently familiar with the exact mechanics of multi threading in Python and Julia to make a clear recommendation here though.

kshyatt-aws commented 2 months ago

It may be the recommendation is similar to when you go to the doctor and say "it hurts when I do this" -- "well then, don't do that" :)

kshyatt-aws commented 2 months ago

Just a note: it looks like supplying jl_yield above as an initializer argument to ThreadPoolExecutor also unsticks things. Not in tox, though 😭

kshyatt-aws commented 2 months ago

Managed to mostly resolve this by using a ProcessPoolExecutor, unsure about the effect on the memory footprint however.

cjdoris commented 2 months ago

This is all really useful and interesting experimentation thanks. The interaction between the Julia task schedular and Python threads is not totally clear to me right now - hopefully sprinkling some jl_adopt_thread and/or jl_enter_threaded_region in select places (e.g. GIL.@lock) helps to reduce the need for workarounds such as explicitly yielding back to Julia.

kshyatt-aws commented 2 months ago

Yeah it's a big pain when your Python code is being run by a ThreadPoolExecutor you do not "own". Do you think the jl_adopt_thread thing might work in that case? I ask because having to use Python Processes sucks if you have to pass big arrays back and forth.