JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.58k stars 5.47k forks source link

Ctrl-C does not work when running multi-threaded code #35524

Open ViralBShah opened 4 years ago

ViralBShah commented 4 years ago

When Ctrl-C'ing multi-threaded code, it crashes Julia altogether.

julia> function fib(n::Int)
           if n < 2
               return n
           end
           t = Threads.@spawn fib(n - 2)
           return fib(n - 1) + fetch(t)
       end^C

julia> fib(50)
^C^C^C^C^Cfatal: error thrown and no exception handler available.
InterruptException()
sigatomic_end at ./c.jl:425 [inlined]
task_done_hook at ./task.jl:442
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:198
start_task at /buildworker/worker/package_linux64/build/src/task.c:697
unknown function (ip: (nil))
Keno commented 4 years ago

Ctrl-C doesn't properly work in single threaded code either ;)

ViralBShah commented 4 years ago

Works better, I think.

StefanKarpinski commented 4 years ago

Seems like structured concurrency would help here, although whenever there's a @sync (explicit or implicit) it should be possible to make this work to the extent that threads can be interrupted successfully (so not 100%, but somewhat).

Keno commented 4 years ago

We just really need to stop having Ctrl-C throw regular exceptions. It's extremely surprising that everything can suddenly also throw interrupt exceptions (not to mention it not being modeled in the compiler).

timholy commented 4 years ago

xref https://github.com/JuliaLang/julia/issues/25790#issuecomment-618986924

ViralBShah commented 4 years ago

Actually, with 1.4 (maybe even 1.3?) I do notice killing single threaded Julia processes is cumbersome too with ctrl-c. @timholy 's explanation was helpful to understand.

ViralBShah commented 4 years ago

I that ctrl-c is less well-behaved than pre-1.3 even for single threaded code, in 1.4. You have to keep it pressed for a while, and you get the big Julia stacktrace.

tkf commented 4 years ago

I mentioned it in the other issue https://github.com/JuliaLang/julia/issues/25790#issuecomment-623163972 but it'd be nice to solve this with structured concurrency #33248.

OkonSamuel commented 4 years ago

When Ctrl-C'ing multi-threaded code, it crashes Julia altogether.

julia> function fib(n::Int)
           if n < 2
               return n
           end
           t = Threads.@spawn fib(n - 2)
           return fib(n - 1) + fetch(t)
       end^C

julia> fib(50)
^C^C^C^C^Cfatal: error thrown and no exception handler available.
InterruptException()
sigatomic_end at ./c.jl:425 [inlined]
task_done_hook at ./task.jl:442
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:198
start_task at /buildworker/worker/package_linux64/build/src/task.c:697
unknown function (ip: (nil))

I recently bumped into a similar issue in a code i wrote. I had to quit julia to stop the threads from running

c42f commented 4 years ago

Ctrl-C is really hard because a large proportion of code is unsafe for async / non-cooperative cancellation by default. For example, there's all sorts of places in Base where we might be holding locks or other resources which aren't precisely protected by a try-catch when the "impossible" happens and a signal is received between creating the resource and "immediately" protecting it. I'm thinking of code like

lk = lock(obj)
# < what happens if we're interrupted here ?
try
    f()
finally
    unlock(lk)
end

cf. the Java Thread.stop() debacle.

Cancellation can be made safe by having a small number of well defined and documented cooperative cancellation points (eg, IO). This is what pthreads do (see man pthreads "Cancellation points"). But this can result in Ctrl-C not actually cancelling the task for quite some time. Which isn't what you really want.

Structured concurrency helps a bit because it gives a systematic way for cleanup to propagate during cancellation. But in itself I don't think it helps resolve the Ctrl-C now-or-later, unsafe-or-safe conundrum.

vtjnash commented 4 years ago

Yep. We even actually already use cancellation points for this, it's just also not sufficient and causes other problems (such as, in the pthreads case, being unable to close file descriptors). Refs https://github.com/JuliaLang/julia/issues/6283

c42f commented 4 years ago

We just really need to stop having Ctrl-C throw regular exceptions

One way to do this is to have Ctrl-C set a flag which is checked at cancellation points. That's well and good, but it does mean Ctrl-C won't cancel things right now, but rather at some later time. Possibly much later, or never if you happen to have written a tight infinite loop!

Any thoughts on how we could handle this? One option might be to extend our existing double-Ctrl-C handling. Currently I recall we avoid delivering InterruptException in ccall'd code which is expected to be unsafe for Julia exceptions. But even normal julia code is actually unsafe for InterruptException! It's delivered asynchronously in a way that can't be easily modeled by programmers (or by the compiler?).

tkf commented 4 years ago

Yeah, I agree that there are problems outside of what structured concurrency can do. But my point is that, even if you can magically solve the problems you mentioned, there are a bunch of problems that are hard to solve without structured concurrency.

lk = lock(obj)
# < what happens if we're interrupted here ?
try

I think this is why we should be recommending lock(...) do instead of manual try-finally. Inside of lock(f, ...) implementation, each lock can use some very low-level compiler machinery to ask not to insert cancellation point within the critical region.

Which isn't what you really want.

I think it's unavoidable in a performance-oriented language like Julia. Surely nobody wants random cancellation points in their carefully-written tight loops. Using only the I/O operation as the cancellation point and letting people manually opt-in by yield or something sounds like a good compromise.

c42f commented 4 years ago

I think this is why we should be recommending lock(...) do instead of manual try-finally.

Absolutely! (The Base implementation of lock(f, lk) is exactly the code I quoted, but of course that could be fixed ;-) )