JuliaPy / PyCall.jl

Package to call Python functions from the Julia language
MIT License
1.47k stars 187 forks source link

Threads.@spawn can fatally crash PyCall @async task #1006

Open the-noble-argon opened 2 years ago

the-noble-argon commented 2 years ago

I'm trying to use PyCall in a Julia solution for various IO tasks (such as handling Parquet files and interacting with various Azure resources). I would like to have these PyCall tasks run in the background on one thread, doing a bunch of IO work using a fancy Python SDK while I do multithreaded number-crunching in Julia. I'm running into a weird issue though. With this script I wrote, I'm getting a fatal error if I run the @sync python task while running a Threads.@spawn task, but everything is fine if I use Threads.@threads. Is there any reason why @Threads works while @Spawn doesn't? Is there a way to make this safe with Threads.@spawn because we can't know, unless we look at the code, if an external library uses Threads.@spawn under the hood.

I'm going through all the docs of PyCall and looking at issues with multi-threading (and there are a few of them that discuss this, like #882 and #883) but we need a better understanding of what we can and can't do in Julia while PyCall is doing something. I have a script that puts locks around the python process, and only executes Python as an @async, from the main thread so it should always happen on the main thread as suggested in #882 (I think?). Anyway, I'm getting fatal errors in Windows (and segfaults in Linux) when I try to run the script where a Threads.@spawn happens while PyCall is running, but it's fine if I use a Threads.@threads

Anyway, here's the script:

using PyCall
const PY_LOCK = ReentrantLock()
const PY_JSON = pyimport("json")

const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()

# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])

function write_json_file(fileName::String, outputData::Dict)
    pylock() do
        open(fileName, "w") do outputFile
            PY_JSON.dump(deepcopy(outputData), outputFile)
        end
    end
    return nothing
end

function file_operation(fileName::String)
    outputData = Dict("a"=>randn(), "b"=>randn())
    write_json_file(fileName, outputData)
    return outputData
end

function multithread_calc(x::Vector)
    y = zeros(Float64, length(x))
    Threads.@threads for ii in eachindex(y)
        y[ii] = log(exp(x[ii]))
    end
    return y
end

function calc_task(x::Vector)
    t = Threads.@spawn log.(exp.(x))
    return fetch(t)
end

calcInput = randn(10000000)
for ii in 1:100
    display(ii)
    file_operation("testfile.json")
    fileTasks   = [@async file_operation("testfile$(ii).json") for ii in 1:30]
    #calcResults =  multithread_calc(calcInput)
    calcResults = calc_task(calcInput)
    wait.(fileTasks)
end

Now, if I modify this script so that it executes a multithreaded calculation using Threads.@threads in multithread_calc() I get no issue

calcInput = randn(10000000)
for ii in 1:100
    display(ii)
    file_operation("testfile.json")
    fileTasks   = [@async file_operation("testfile$(ii).json") for ii in 1:30]
    calcResults =  multithread_calc(calcInput)
    #calcResults = calc_task(calcInput)
    wait.(fileTasks)
end

Otherwise, if I use the calc_task() function that has Threads.@spawn while running the Python tasks, I get the following error.

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa4e385adb -- PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
in expression starting at g:\My Drive\tests\julia_pylock_json.jl:42
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyLong_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_DecodeUTF8Stateful at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_FromId at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_FinalizeEx at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_FinalizeEx at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_Finalize at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyinit.jl:125
unknown function (ip: 00000000618a7a23)
_atexit at .\initdefs.jl:372
unknown function (ip: 00000000618a7013)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
ijl_atexit_hook at /cygdrive/c/buildbot/worker/package_win64/build/src\init.c:219
ijl_exit at /cygdrive/c/buildbot/worker/package_win64/build/src\jl_uv.c:640
jl_exception_handler at /cygdrive/c/buildbot/worker/package_win64/build/src\signals-win.c:322
__julia_personality at /cygdrive/c/buildbot/worker/package_win64/build/src\win32_ucontext.c:28
_chkstk at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
KiUserExceptionDispatcher at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyLong_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicodeWriter_WriteASCIIString at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_FromFormatV at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyErr_Format at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_GetBuffer at C:\Users\user\Miniconda3\python39.dll (unknown line)
isbuftype! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pybuffer.jl:134 [inlined]
isbuftype at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pybuffer.jl:148 [inlined]
pysequence_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:759
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:773
#36 at .\none:0 [inlined]
iterate at .\generator.jl:47
unknown function (ip: 000000006189dc50)
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:703
typetuple at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:745
unknown function (ip: 000000006189cd9a)
pysequence_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:754
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:773
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:806 [inlined]
convert at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:831
julia_args at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:18 [inlined]
_pyjlwrap_call at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:24
unknown function (ip: 000000006189cc0a)
pyjlwrap_call at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:44
unknown function (ip: 000000006187c398)
PyObject_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_NewReference at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyVectorcall_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
macro expansion at C:\Users\user\.julia\packages\PyCall\ygXW2\src\exception.jl:95 [inlined]
#107 at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
disable_sigint at .\c.jl:473 [inlined]
__pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
_pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
_pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
unknown function (ip: 000000006188c5f5)
#_#114 at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:730
PyObject at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
#2 at g:\My Drive\tests\julia_pylock_json.jl:16484
#open#378 at .\io.jl:384
open at .\io.jl:381 [inlined]
#1 at g:\My Drive\tests\julia_pylock_json.jl:15 [inlined]
lock at .\lock.jl:185
pylock at g:\My Drive\tests\julia_pylock_json.jl:10 [inlined]
write_json_file at g:\My Drive\tests\julia_pylock_json.jl:14 [inlined]
file_operation at g:\My Drive\tests\julia_pylock_json.jl:24
#11 at .\task.jl:484
unknown function (ip: 00000000618a5de3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 10783290 (Pool: 10776630; Big: 6660); GC: 14
the-noble-argon commented 2 years ago

I managed to find a temporary workaround for this issue. I had to modify the pylock() function to not only to do lock/unlock but to also disable garbage collection while the Python code is running (which is probably not ideal).

pylock(f::Function) = Base.lock(PYLOCK[]) do
    prev_gc = GC.enable(false)
    try 
        return f()
    finally
        GC.enable(prev_gc) # recover previous state
    end
end

This highlights the importance of #883 in making sure that PyCall tasks can't be corrupted by garbage-collections triggered by other threads, becuase I don't think this drastic kludge of disabling/enabling the garbage collector is the kind of solution Julia devs would encourage.