Open the-noble-argon opened 2 years ago
I managed to find a temporary workaround for this issue. I had to modify the pylock() function to not only to do lock/unlock but to also disable garbage collection while the Python code is running (which is probably not ideal).
pylock(f::Function) = Base.lock(PYLOCK[]) do
prev_gc = GC.enable(false)
try
return f()
finally
GC.enable(prev_gc) # recover previous state
end
end
This highlights the importance of #883 in making sure that PyCall tasks can't be corrupted by garbage-collections triggered by other threads, becuase I don't think this drastic kludge of disabling/enabling the garbage collector is the kind of solution Julia devs would encourage.
I'm trying to use PyCall in a Julia solution for various IO tasks (such as handling Parquet files and interacting with various Azure resources). I would like to have these PyCall tasks run in the background on one thread, doing a bunch of IO work using a fancy Python SDK while I do multithreaded number-crunching in Julia. I'm running into a weird issue though. With this script I wrote, I'm getting a fatal error if I run the @sync python task while running a Threads.@spawn task, but everything is fine if I use Threads.@threads. Is there any reason why @Threads works while @Spawn doesn't? Is there a way to make this safe with Threads.@spawn because we can't know, unless we look at the code, if an external library uses Threads.@spawn under the hood.
I'm going through all the docs of PyCall and looking at issues with multi-threading (and there are a few of them that discuss this, like #882 and #883) but we need a better understanding of what we can and can't do in Julia while PyCall is doing something. I have a script that puts locks around the python process, and only executes Python as an @async, from the main thread so it should always happen on the main thread as suggested in #882 (I think?). Anyway, I'm getting fatal errors in Windows (and segfaults in Linux) when I try to run the script where a Threads.@spawn happens while PyCall is running, but it's fine if I use a Threads.@threads
Anyway, here's the script:
Now, if I modify this script so that it executes a multithreaded calculation using Threads.@threads in multithread_calc() I get no issue
Otherwise, if I use the calc_task() function that has Threads.@spawn while running the Python tasks, I get the following error.