Open kshyatt-aws opened 2 months ago
Could this possibly be https://github.com/tox-dev/tox/issues/3254 ?
Did you try moving the 'import juliacall' within the test method? This has solved lots of problems for me with multiprocess
I'll give that a spin, thanks for the tip!
No joy 😭
Here's a smaller reproducer that involves neither tox nor pytest and no Threads.@threads
on the Julia side. Any printing causes the program to get stuck. Looks like a deadlock to me.
from concurrent.futures import ThreadPoolExecutor, as_completed
from juliacall import Main as jl
jl.seval(
"""
function my_sqr()
println("Julia get's stuck at this print statement issued from thread $(Threads.threadid())")
a = rand(10)
for ii in 1:size(a, 1)
a[ii] = a[ii]^2
end
return sum(a)
end
"""
)
pool = ThreadPoolExecutor(2)
fs = {pool.submit(jl.my_sqr): ix for ix in range(10)}
for future in as_completed(fs):
print("running")
rank = fs[future]
results = future.result()
print("done")
print(results)
Just for reproducibility, here is my tox.ini
:
[tox]
envlist = unit-tests
[testenv:unit-tests]
basepython = python3
allowlist_externals =
pytest
commands =
pytest {posargs}
extras = test
and my pyproject.toml
deps list:
[project.optional-dependencies]
test = [
"black",
"flake8",
"flake8-rst-docstrings",
"isort",
"pre-commit",
"pylint",
"pytest==7.1.2",
"pytest-benchmark",
"pytest-cov",
"pytest-rerunfailures",
"pytest-timeout",
"pytest-xdist",
"sphinx",
"sphinx-rtd-theme",
"sphinxcontrib-apidoc",
"tox"
]
[tool.setuptools.dynamic]
dependencies = {file = "requirements.txt"}
and requirements.txt
:
juliacall==0.9.22
numpy
TLDR: one workaround might be to just call jl.seval("ccall(:jl_enter_threaded_region, Cvoid, ())")
before any threaded code (and maybe call the exit
version, jl.seval("ccall(:jl_exit_threaded_region, Cvoid, ())")
, afterwards to re-"prioritize IO over threading").
I put the code from https://github.com/JuliaPy/PythonCall.jl/issues/539#issuecomment-2293753477 into script.py
and ran it under lldb as:
lldb python script.py
(lldb) target create "python"
Current executable set to '/Users/eph/.pyenv/versions/3.9.1/bin/python' (arm64).
(lldb) settings set -- target.run-args "script.py"
(lldb) r
Process 91437 launched: '/Users/eph/.pyenv/versions/3.9.1/bin/python' (arm64)
warning: (arm64) /Users/eph/.julia/compiled/v1.10/UnsafePointers/FMCLb_VVXwr.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/Pidfile/wlmRx_VVXwr.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/Scratch/ICI1U_elg9D.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/compiled/v1.10/LazyArtifacts/MRP8l_RLQSU.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/DataValueInterfaces/9Lpkp_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/DataAPI/3a8mN_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/IteratorInterfaceExtensions/N0h8q_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (arm64) /Users/eph/.julia/compiled/v1.10/TableTraits/I6SaN_6xEyZ.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
/Users/eph/PythonCall.jl/pysrc/juliacall/__init__.py:247: UserWarning: Julia was started with multiple threads but multithreading support is experimental in JuliaCall. It is recommended to restart Python with the environment variable PYTHON_JULIACALL_HANDLE_SIGNALS=yes set, otherwise you may experience segfaults or other crashes. Note however that this interferes with Python's own signal handling, so for example Ctrl-C will not raise KeyboardInterrupt. See https://juliapy.github.io/PythonCall.jl/stable/faq/#Is-PythonCall/JuliaCall-thread-safe? for further information. You can suppress this warning by setting PYTHON_JULIACALL_HANDLE_SIGNALS=no.
warnings.warn(
Julia get's stuck at this print statement issued from thread 6
Process 91437 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
libsystem_kernel.dylib`:
-> 0x18db599ec <+8>: b.lo 0x18db59a0c ; <+40>
0x18db599f0 <+12>: pacibsp
0x18db599f4 <+16>: stp x29, x30, [sp, #-0x10]!
0x18db599f8 <+20>: mov x29, sp
Target 0: (python) stopped.
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x000000010010577c python`take_gil + 176
frame #3: 0x0000000100105e50 python`PyEval_RestoreThread + 24
frame #4: 0x00000001001a6fc0 python`acquire_timed + 132
frame #5: 0x00000001001a6c94 python`lock_PyThread_acquire_lock + 56
frame #6: 0x00000001000451b4 python`method_vectorcall_VARARGS_KEYWORDS + 248
frame #7: 0x000000010010e834 python`call_function + 416
frame #8: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #9: 0x000000010010f784 python`_PyEval_EvalCode + 3032
frame #10: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
frame #11: 0x000000010010e834 python`call_function + 416
frame #12: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #13: 0x000000010010f784 python`_PyEval_EvalCode + 3032
frame #14: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
frame #15: 0x000000010010e834 python`call_function + 416
frame #16: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #17: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #18: 0x000000010010e834 python`call_function + 416
frame #19: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #20: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #21: 0x000000010010e834 python`call_function + 416
frame #22: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #23: 0x000000010010f784 python`_PyEval_EvalCode + 3032
frame #24: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
frame #25: 0x000000010010e834 python`call_function + 416
frame #26: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #27: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #28: 0x000000010010e834 python`call_function + 416
frame #29: 0x000000010010c228 python`_PyEval_EvalFrameDefault + 23448
frame #30: 0x000000010010f784 python`_PyEval_EvalCode + 3032
frame #31: 0x00000001001065c8 python`PyEval_EvalCode + 80
frame #32: 0x000000010014c618 python`PyRun_FileExFlags + 316
frame #33: 0x000000010014b93c python`PyRun_SimpleFileExFlags + 248
frame #34: 0x0000000100168ae8 python`Py_RunMain + 1708
frame #35: 0x0000000100168fb0 python`pymain_main + 340
frame #36: 0x000000010016902c python`Py_BytesMain + 40
frame #37: 0x000000018d80e0e0 dyld`start + 2360
thread #2
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x0000000155051fa0, mutex=0x0000000155051f60) at thread.c:883:7
frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x000000010be70030, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
frame #4: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
frame #5: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
frame #6: 0x000000011a1f92d8 sys.dylib`julia_YY.waitYY.645_74778.4 at condition.jl:130
frame #7: 0x00000001042b8148
frame #8: 0x00000001015fab08 libjulia-internal.1.10.4.dylib`start_task [inlined] jl_apply(args=<unavailable>, nargs=1) at julia.h:1982:12 [opt]
frame #9: 0x00000001015faafc libjulia-internal.1.10.4.dylib`start_task at task.c:1238:19 [opt]
thread #3
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x000000012600b5a0, mutex=0x000000012600b560) at thread.c:883:7
frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x000000010be78030, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
frame #4: 0x00000001197e7a28 sys.dylib`jlplt_ijl_task_get_next_75485.3 + 104
frame #5: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
frame #6: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
frame #7: 0x000000011aa25100 sys.dylib`julia_task_done_hook_75391.3 at task.jl:675
frame #8: 0x0000000119eb0d04 sys.dylib`jfptr_task_done_hook_75392.3 + 56
frame #9: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011cdb3180, args=0x0000000170e0af58, nargs=1, mfunc=0x000000011cdb3020, world=<unavailable>) at gf.c:0 [opt]
frame #10: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011cdb3180, args=0x0000000170e0af58, nargs=<unavailable>) at gf.c:3077:12 [opt]
frame #11: 0x00000001015f9a08 libjulia-internal.1.10.4.dylib`jl_finish_task [inlined] jl_apply(args=<unavailable>, nargs=2) at julia.h:1982:12 [opt]
frame #12: 0x00000001015f9a00 libjulia-internal.1.10.4.dylib`jl_finish_task(t=0x000000010be80010) at task.c:320:13 [opt]
frame #13: 0x00000001016228e8 libjulia-internal.1.10.4.dylib`jl_threadfun(arg=0x00000001546369e0) at partr.c:199:5 [opt]
frame #14: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
thread #4
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x000000014481bda0, mutex=0x000000014481bd60) at thread.c:883:7
frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x000000010be68030, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
frame #4: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
frame #5: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
frame #6: 0x000000011aa25100 sys.dylib`julia_task_done_hook_75391.3 at task.jl:675
frame #7: 0x0000000119eb0d04 sys.dylib`jfptr_task_done_hook_75392.3 + 56
frame #8: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011cdb3180, args=0x0000000171612f58, nargs=1, mfunc=0x000000011cdb3020, world=<unavailable>) at gf.c:0 [opt]
frame #9: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011cdb3180, args=0x0000000171612f58, nargs=<unavailable>) at gf.c:3077:12 [opt]
frame #10: 0x00000001015f9a08 libjulia-internal.1.10.4.dylib`jl_finish_task [inlined] jl_apply(args=<unavailable>, nargs=2) at julia.h:1982:12 [opt]
frame #11: 0x00000001015f9a00 libjulia-internal.1.10.4.dylib`jl_finish_task(t=0x000000010be8c010) at task.c:320:13 [opt]
frame #12: 0x00000001016228e8 libjulia-internal.1.10.4.dylib`jl_threadfun(arg=0x0000000154636a00) at partr.c:199:5 [opt]
frame #13: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
thread #5
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x00000001019561d0, mutex=0x0000000101956190) at thread.c:883:7
frame #3: 0x000000010162279c libjulia-internal.1.10.4.dylib`jl_gc_mark_threadfun(arg=<unavailable>) at partr.c:139:13 [opt]
frame #4: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
thread #6
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x000000010d67f804 libopenblas64_.dylib`blas_thread_server + 388
frame #3: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
thread #7
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x000000010d67f804 libopenblas64_.dylib`blas_thread_server + 388
frame #3: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
thread #8
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x000000010d67f804 libopenblas64_.dylib`blas_thread_server + 388
frame #3: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
thread #9
frame #0: 0x000000018db57ea4 libsystem_kernel.dylib`__workq_kernreturn + 8
thread #11
frame #0: 0x000000018db599ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018db9755c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001016aa3b0 libjulia-internal.1.10.4.dylib`uv_cond_wait(cond=0x000000014497eda0, mutex=0x000000014497ed60) at thread.c:883:7
frame #3: 0x0000000101622f38 libjulia-internal.1.10.4.dylib`ijl_task_get_next(trypoptask=0x000000011cdb63d0, q=0x0000000108690e10, checkempty=0x000000011c269250) at partr.c:517:17 [opt]
frame #4: 0x0000000119ffbda4 sys.dylib`julia_poptask_75478.3 at task.jl:985
frame #5: 0x000000011932a840 sys.dylib`julia_wait_74759.3 at task.jl:994
frame #6: 0x000000011b200a3c sys.dylib`julia_uv_write_76288.4 at stream.jl:1048
frame #7: 0x000000011b201a34 sys.dylib`julia_unsafe_write_50945.4 at stream.jl:1120
frame #8: 0x0000000118f2ad58 sys.dylib`japi1_print_49291.4 at io.jl:248
frame #9: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011e14e750, args=0x00000001730e16e8, nargs=3, mfunc=0x000000011f01f910, world=<unavailable>) at gf.c:0 [opt]
frame #10: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011e14e750, args=0x00000001730e16e8, nargs=<unavailable>) at gf.c:3077:12 [opt]
frame #11: 0x00000001015e9ae4 libjulia-internal.1.10.4.dylib`do_apply [inlined] jl_apply(args=<unavailable>, nargs=4) at julia.h:1982:12 [opt]
frame #12: 0x00000001015e9ad4 libjulia-internal.1.10.4.dylib`do_apply(args=0x00000001730e1810, nargs=<unavailable>, iterate=0x000000011e09da00) at builtins.c:768:26 [opt]
frame #13: 0x0000000119e45764 sys.dylib`japi1_println_50066.3 at io.jl:75
frame #14: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x000000011c0c6f90, args=0x00000001730e1950, nargs=2, mfunc=0x000000011c0c7560, world=<unavailable>) at gf.c:0 [opt]
frame #15: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x000000011c0c6f90, args=0x00000001730e1950, nargs=<unavailable>) at gf.c:3077:12 [opt]
frame #16: 0x000000011ab24660 sys.dylib`julia_println_50057.3 at coreio.jl:4
frame #17: 0x000000010c5ac094
frame #18: 0x000000011543996c WdXsa_xycUF.dylib`julia__pyjl_callmethod_8988 at base.jl:73
frame #19: 0x000000011543a0a4 WdXsa_xycUF.dylib`julia__pyjl_callmethod_8981 at C.jl:63
frame #20: 0x000000011543a1b0 WdXsa_xycUF.dylib`jfptr__pyjl_callmethod_8982 + 100
frame #21: 0x00000001015dbd7c libjulia-internal.1.10.4.dylib`ijl_apply_generic [inlined] _jl_invoke(F=0x0000000115613270, args=0x00000001730e1ec0, nargs=2, mfunc=0x00000001157368d0, world=<unavailable>) at gf.c:0 [opt]
frame #22: 0x00000001015dbd10 libjulia-internal.1.10.4.dylib`ijl_apply_generic(F=0x0000000115613270, args=0x00000001730e1ec0, nargs=<unavailable>) at gf.c:3077:12 [opt]
frame #23: 0x000000011548e594 WdXsa_xycUF.dylib`jlcapi__pyjl_callmethod_9066 + 228
frame #24: 0x000000010007dcdc python`cfunction_call + 168
frame #25: 0x000000010003c7bc python`_PyObject_MakeTpCall + 360
frame #26: 0x000000010010e894 python`call_function + 512
frame #27: 0x000000010010c1ac python`_PyEval_EvalFrameDefault + 23324
frame #28: 0x000000010010f784 python`_PyEval_EvalCode + 3032
frame #29: 0x000000010003cf38 python`_PyFunction_Vectorcall + 256
frame #30: 0x000000010003c5e4 python`_PyObject_FastCallDictTstate + 272
frame #31: 0x000000010003d2bc python`_PyObject_Call_Prepend + 148
frame #32: 0x000000010009c744 python`slot_tp_call + 224
frame #33: 0x000000010003cd64 python`_PyObject_Call + 172
frame #34: 0x000000010010c4a4 python`_PyEval_EvalFrameDefault + 24084
frame #35: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #36: 0x000000010010e834 python`call_function + 416
frame #37: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #38: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #39: 0x000000010010c4a4 python`_PyEval_EvalFrameDefault + 24084
frame #40: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #41: 0x000000010010e834 python`call_function + 416
frame #42: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #43: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #44: 0x000000010010e834 python`call_function + 416
frame #45: 0x000000010010c188 python`_PyEval_EvalFrameDefault + 23288
frame #46: 0x000000010003cfb4 python`function_code_fastcall + 112
frame #47: 0x000000010003efb0 python`method_vectorcall + 284
frame #48: 0x00000001001a7b34 python`t_bootstrap + 72
frame #49: 0x0000000100159288 python`pythread_wrapper + 28
frame #50: 0x000000018db96f94 libsystem_pthread.dylib`_pthread_start + 136
The last thread, thread 11 here looks like the one printing. The last line in the Julia code seems to be this line where it does a ccall whose source is here where there is this comment. I don’t understand it entirely since I’m not quite sure what a _threadedregion
is, but maybe some invariant the Julia scheduler is expecting is not being maintained with the python threads.
Looking here, it says
- jl_...() API functions may only be called from the thread in which jl_init() was called, or from threads started by the Julia runtime. Calling Julia API functions from user-started threads is not supported, and may lead to undefined behaviour and crashes.
That seems to be the case here, where we are calling into the Julia runtime from user-started threads (started by python). However I wonder if that documentation is out of date, since here foreign threads are supposed to be able to call Julia code via jl_adopt_thread
(I guess the missing docs is Test (and update documentation) for foreign threads functionality · Issue #47627 · JuliaLang/julia). However PythonCall does use @ccallable
which is supposed to automatically do the jl_adopt_thread
stuff (looking at Extreme Multi-Threading: C++ and Julia 1.9 Integration). That blog post also mentions:
- jl_enter_threaded_region sets Julia to multi-threading mode, I believe. This function is also used for example by the Julia @threads macro, but lacks any documentation.
which seems like the same threaded region stuff mentioned in the comment near the hang. If I change the last lines of the script to
jl.seval("ccall(:jl_enter_threaded_region, Cvoid, ())")
fs = {pool.submit(jl.my_sqr): ix for ix in range(10)}
for future in as_completed(fs):
print("running")
rank = fs[future]
results = future.result()
print("done")
print(results)
jl.seval("ccall(:jl_exit_threaded_region, Cvoid, ())")
where I’ve just added the ccalls, then the code does not hang!
In the blog post they seem to call jl_enter_threaded_region
during initialization and never exit, I wonder if something like that is feasible for PythonCall. The source code comment on the _threaded_region
object itself just says
_Atomic(unsigned) _threadedregion; // keep track of whether to prioritize IO or threading
so I wonder what the consequences of prioritizing threading over IO are. Anyway it seems like this might be useful as a quick workaround, and maybe PythonCall itself should be calling jl_enter_threaded_region
?
@ericphanson does this resolve the MWE I showed at the top as well? Or "only" the printing one (which is still awesome)?
It fixes the printing example but not yours. Let me split my comment into a separate issue
Ah sorry, I didn’t try yours since I didn’t know how to run tox so I went for the easier one. I’ll try that one later but I’ve had issues getting lldb to run under pytest before so I’m not sure if I’ll get it working with tox.
I just stumbled across this section of the docs: https://juliapy.github.io/PythonCall.jl/dev/juliacall/#Caveat:-Julia's-task-scheduler TL;DR: currently, if a task ever yields to the Julia scheduler, it will not be resumed automatically; one has to yield to the Julia scheduler explicitly from Python to resume those tasks.
In fact the following version of the og tox
example does not hang:
from juliacall import Main as jl
from concurrent.futures import ThreadPoolExecutor, as_completed, wait
jl_yield = getattr(jl, "yield")
def test_sample():
jl.seval(
"""
function my_sqr()
a = rand(20, 20)
Threads.@threads for ii in 1:size(a, 1)
for jj in 1:size(a, 2)
a[ii,jj] = a[ii,jj]^2
end
end
return
end
"""
)
pool = ThreadPoolExecutor(2)
fs = {pool.submit(jl.my_sqr._jl_call_nogil): ix for ix in range(10)}
jl_yield() # yielding once so that we start iterating below
for future in as_completed(fs):
jl_yield() # yielding again in case there are any waiting Julia tasks
rank = fs[future]
results = future.result()
print(results)
I think you can set the commands =
part of tox
to lldb
similar to what you did for pytest
, then run tox -e unit-tests
to just invoke that.
OK, was able to repro the tox
success above! I'm even able to have it working if I do:
def py_sqr():
jl.my_sqr._jl_call_nogil()
return
pool = ThreadPoolExecutor(10)
fs = {pool.submit(py_sqr): ix for ix in range(10)}
But not if I don't control the python code which is submitting the tasks and collecting them (e.g. if I'm using some external Python package which has a do_stuff()
function which is submitting to the pool and collecting). Is there any workaround in that situation?
I wonder if one could have a python task that just wakes up from time to time to yield to the Julia scheduler. I'm not sufficiently familiar with the exact mechanics of multi threading in Python and Julia to make a clear recommendation here though.
It may be the recommendation is similar to when you go to the doctor and say "it hurts when I do this" -- "well then, don't do that" :)
Just a note: it looks like supplying jl_yield
above as an initializer
argument to ThreadPoolExecutor
also unsticks things. Not in tox
, though 😭
Managed to mostly resolve this by using a ProcessPoolExecutor
, unsure about the effect on the memory footprint however.
This is all really useful and interesting experimentation thanks. The interaction between the Julia task schedular and Python threads is not totally clear to me right now - hopefully sprinkling some jl_adopt_thread
and/or jl_enter_threaded_region
in select places (e.g. GIL.@lock
) helps to reduce the need for workarounds such as explicitly yielding back to Julia.
Yeah it's a big pain when your Python code is being run by a ThreadPoolExecutor
you do not "own". Do you think the jl_adopt_thread
thing might work in that case? I ask because having to use Python Process
es sucks if you have to pass big arrays back and forth.
Affects: JuliaCall
Describe the bug When I run the following MWE, the code succeeds when run under
pytest
but hangs forever undertox
, presumably becausepytest
is launching the test from the main thread? I haveJULIA_NUM_THREADS
set toauto
.If I run
bs = [my_sqr() for ix in range(10)]
instead in the test, everything works fine in both
tox
andpytest
.Your system This is on PythonCall/juliacall 0.9.22 and Julia 1.10.4, Python 3.10.14, Mac M2.
Additional context Add any other context about the problem here.