JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.58k stars 5.47k forks source link

Forced SIGINT interrupt often required rather than ctrl+C for codes taking a long time #48154

Open gaurav-arya opened 1 year ago

gaurav-arya commented 1 year ago

Update: After learning some more, it seems the issue is that for some codes, one is required to spam ctrl+C to force a SIGINT interrupt rather than simply press ctrl+C once in order to stop it (without killing the whole REPL). Below is one MWE, another MWE is at https://github.com/JuliaLang/julia/issues/48154#issuecomment-1373730345 (although seemingly fixed on master).

I've often had particular trouble interrupting code in VSCode and Jupyter Notebook environments: my suspicion is that this is because the SIGINT interrupt way of doing things totally fails in certain cases, and then one has to kill the whole Julia process. Potentially this is an issue with these external tools, potentially it would be helpful to understand why forced SIGINT interrupts are so often needed rather than ctrl+C, potentially it should be easier to trigger forced SIGINT interrupts (including from e.g. VSCode or a Jupyter notebook) since they are so commonly needed,... I'm not sure.


I still seem to having this issue on 1.9.0-beta2:

MWE:

function f()
    while true
    end
end

f()

Spamming ctrl+C (edit: the reason this didn't work was because I was in VSCode), holding onto ctrl+C, etc. don't seem to terminate the function for me. Version info:

julia> versioninfo()
Julia Version 1.9.0-beta2
Commit 7daffeecb8c (2022-12-29 07:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 

Here are a couple of related issues: https://github.com/JuliaLang/julia/issues/46635, https://github.com/JuliaLang/julia/issues/35524. It wasn't clear to me whether this a dup. The former issue is closed, but the fix https://github.com/JuliaLang/julia/pull/47901 should already exist on 1.9.0-beta2 if I am understanding things right?

Also note that in the above MWE, ctrl+C or ctrl+Z does not seem to work at all for me in a reasonable time frame, I need to terminate the REPL. In other cases, also on 1.9.0-beta2, I'm also still experiencing the behaviour where after holding onto ctrl+C / pressing it several times or using ctrl+Z eventually works, but takes some time.

Seelengrab commented 1 year ago

Not sure how you'd expect to interrupt a loop that doesn't yield() anywhere - #47901 is explicitly about making CTRL+C work when sleep is called, which yields to the scheduler internally.

FYI, I can interrupt the loop (though I don't expect it to) with force throwing after the fifth CTRL+C:

julia> f()
^C^C^C^C^CWARNING: Force throwing a SIGINT
^CERROR: InterruptException:
Stacktrace:
 [1] f()
   @ Main ./REPL[9]:3
 [2] top-level scope
   @ REPL[10]:1

julia> ^C

julia> ^C

julia> ^C

julia> ^C

julia> ^C

julia> ^C

julia> ^C

julia> ^C

julia> versioninfo()
Julia Version 1.10.0-DEV.263
Commit f0aed884a1* (2023-01-03 17:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 24 × AMD Ryzen 9 7900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 24 on 24 virtual cores
Environment:
  JULIA_NUM_THREADS = 24
gaurav-arya commented 1 year ago

Oh, I guess I don't really understand how interrupts work / when we can expect to interrupt such codes. Feel free to close if it's expected behavior

I guess the pain point for me is that I tend to have to kill the whole REPL if I accidentally end up in such a situation? But maybe that's unavoidable

Seelengrab commented 1 year ago

I'm not saying it's expected, I just think it'd be kind of hard to interrupt such code reliably, since it's just jumping to itself without any scheduler yield points :shrug: If I may, how did you arrive at that code? Surely an empty while loop is not representative of the issue where you encountered this :eyes:

gaurav-arya commented 1 year ago

Oh, are you saying that the empty while loop is a pathological case, and other cases, such as an infinite running while loop with a nontrivial body, or code that's going to terminate but in a very long time, should generally be feasible to stop with a ctrl-C?

My real examples are generally in the "oops, I made parameters way too large and this code is going to take too long, I want to stop it" category and I quite often find that ctrl-C isn't working and I have to kill my whole REPL. I don't have an MWE for such a case right now, but if you're saying that my current MWE is a pathological case that wouldn't explain why real-world cases have this issue then happy to try and find a better MWE.

gaurav-arya commented 1 year ago

OK, so a minimzed version of where I'm running into issues is:

mutable struct A
    x::Vector{Float64}
end

function f(a)
     a.x .-= 1
end

a = A(rand(1000000))
for i in 1:1000000
    f(a)
end

When I run the above example in the REPL itself, a single ctrl-C does not work on my machine. (If I really minify the above example by getting rid of the struct and func, it does, which is interesting...) Spamming ctrl+C around 5 times in quick succession does force a SIGINT. My hope would be that there is some sort of "expected" way to stop the above code without killing the REPL: if this spamming behaviour can be expected, is it documented anywhere? (If one didn't know to spam ctrl+C, the only solution would be to kill the REPL.)

However, I'm realizing one more thing: when running the above code not in the REPL itself, but rather by ctrl+enter'ing the code from VSCode, I can't even force the SIGINT. That's really the practical problem I'm facing, and perhaps that's a VSCode problem rather than a julia problem so I could file an issue there.

gbaraldi commented 1 year ago

I will say that I can interrupt this on master, so maybe this has already been fixed ;)

gaurav-arya commented 1 year ago

Does both the while loop and my second MWE get fixed on master? And by fixed, do you mean a single ctrl+C suffices? Would be happy to test on master if I knew how, curious what could have changed from the latest 1.9 release:)

Seelengrab commented 1 year ago

And by fixed, do you mean a single ctrl+C suffices?

I can interrupt the empty while loop with force throwing. Your second MWE can be interrupted with just one SIGINT.

gaurav-arya commented 1 year ago

Okay, cool:) If nearly every non-trivial case which usually requires spamming ctrl+C is fixed, then that probably solves things:! But I only have Julia 1.9, so it's difficult for me to test how far things have been fixed beyond that MWE.

As I wrote in the update to the issue body, my main issue with the "spamming ctrl+C" is that this seems to be very often not supported outside the REPL, e.g. in VSCode code running or in Jupyter notebook cells. So having to spam ctrl+C turns into having to kill the entire Julia process. Admittedly, I'm not exactly sure what the precise focus of this issue is and if there's any concrete issue with base/core Julia. Maybe all external tools simply need to make sure they provide the ability to do this forced SIGINT interrupt... or perhaps there is something that can be done in Julia itself to alleviate this.

andrewjradcliffe commented 1 year ago

For me, 1.9 provides the following behavior: second example interrupted with single ^C, first example via 5 ^C's. I use Emacs and vterm, so it's not quite bare terminal.

My naive guess (I know nothing of VSCode) is that this is related to VSCode's abstraction over whatever it considers to be a bare terminal (if it even uses that, or perhaps some other approach to interfacing with a Julia process). Try launching julia in a bare terminal and testing again.

gaurav-arya commented 1 year ago

On a bare terminal, the second example does still seem to require 5 ^C's on my machine. I'm not sure why, could it be machine dependent?

In general, though, I would like to understand what the "expected behaviour" is: is it expected behaviour to require 5 ^C's for these sorts of codes (the while true end being a common example for all of us). And is it expected for 5 ^C's to always work, or is it also reasonable to be unresponsive? And is this documented anywhere, so that users know about it, and more importantly so that tooling outside the REPL can make sure it's possible to stop the code at all without killing Julia?

andrewjradcliffe commented 1 year ago

more importantly so that tooling outside the REPL can make sure it's possible to stop the code at all without killing Julia?

That responsibility belongs to the creators of the tooling. As far as Julia itself is concerned, if you can send ^C's, then you can interrupt; the manual describes how to incorporate interrupts in scripts.

Based on your observations with VSCode/Jupyter, it seems fairly clear that their abstraction over Julia processes has a problem with signals. A simple explanation is that the ^C's are simply not being sent, for whatever reason.

On the multiple mention of massive pain point: a little forethought goes a long way; good programming practices will take one farther. Ask yourself why you find the need to frequently interrupt, and whether a debugger might be more appropriate for the task. An orthogonal line of inquiry is to ask yourself: so what if I kill the process? Could I not just run the code up to that point and pay nothing but the compute time? If one cannot, that suggests that one should be more formal about writing reproducible code.

A side note is that when one uses tools like VSCode/Jupyter, one is dealing with abstractions, not the real thing. If you want the "real thing", options are the bare terminal or a terminal emulator that makes a genuine effort to support most features (escape codes being a frequent pain point). This means something like Emacs or vim + tmux; I recommend the former for innumerable reasons, foremost of which is that you now hack your editor just like you hack code. It is mentioned nowhere, probably because mentions of Emacs usually bring on groans of dismay, but the Julia REPL keybinds are identical to the default Emacs keybinds (and bash, provided you haven't be duped into vim binds as default).

gaurav-arya commented 1 year ago

On the multiple mention of massive pain point: a little forethought goes a long way; good programming practices will take one farther. Ask yourself why you find the need to frequently interrupt, and whether a debugger might be more appropriate for the task. An orthogonal line of inquiry is to ask yourself: so what if I kill the process? Could I not just run the code up to that point and pay nothing but the compute time? If one cannot, that suggests that one should be more formal about writing reproducible code.

In an interactive setting, I feel like it is reasonable to frequently run code that takes longer than expected and wish to interrupt it without killing the Julia session. I made this issue because it seemed like an impairment to usability. Stating that it is a pain point for me is to state "in my perhaps suboptimal, but hopefully not totally inane, use of Julia, this has been a problem for me". If all the burden is on the user, of course there's no issue here.

As far as Julia itself is concerned, if you can send ^C's, then you can interrupt; the manual describes how to incorporate interrupts in scripts

I still feel like the expected behaviour of interrupts, and the distinction between one ctrl+C versus five ctrl+C's, could potentially be clarified more. For example, @Seelengrab mentioned that they didn't expect five ctrl+C's to work in the while true end case. Does that mean that if it stopped working in a future release of Julia, it would not be a regression? If one were to find a pure Julia code where even five ctrl+C's didn't work, would it be a bug or not? I know very little about interrupts (and how much of it is determined by the system v.s. the language implementation), so the answers to these questions are not obvious to me.

Also, with the current state of things it could be helpful for users to know that on certain systems, spamming SIGINT's may help? For example, in a Jupyter notebook, it looks like a forced SIGINT interrupt can be thrown, but it had never occured to me until now that I should try to spam the stop button of a cell.

In any case, I agree the main issue is with tooling that does not allow for these forced SIGINT interrupts, which I didn't realize when I made the issue. So if the above is not worth an issue, I'm happy if the issue is closed:)

P.S. edited comments above to remove mentions of "massive pain point", sorry if that came off strong

gaurav-arya commented 1 year ago

Perhaps the concrete questions for Julia itself are:

a) Is the SIGINT forcing behaviour, coded up in https://github.com/JuliaLang/julia/blob/8dbf7a15170ce529fbabc72a11a6f7ca2df57fee/src/signals-mach.c#L422-L451 and https://github.com/JuliaLang/julia/blob/8dbf7a15170ce529fbabc72a11a6f7ca2df57fee/src/signal-handling.c#L198-L219, the best way of doing it? (E.g. should it be easier to throw a forced interrupt?) The idea here is that in a large number of cases, a forced interrupt would be preferable to killing the session, particularly in Julia due to load times / TTFX.

b) Should the behaviour at least be more discoverable, e.g. is there documentation for it?

c) Is there any way of reducing the need for forced interrupts? For example, in my second MWE (with the disclaimer that others have not been able to reproduce my need for a forced interrupt in that case), it would likely cost nothing to yield() given the cost of f. This could potentially be a pretty big QOL improvement, e.g. Python interrupts an infinite loop with a single ctrl-C. Of course, this seems very tricky in general given Julia does not have the luxury of sacrificing performance, and definitely beyond my expertise, but I would like to at least raise the question:)

Seelengrab commented 1 year ago

For example, @Seelengrab mentioned that they didn't expect five ctrl+C's to work in the while true end case. Does that mean that if it stopped working in a future release of Julia, it would not be a regression?

While I wouldn't expect it to work, it's nice that it does and I don't think it would be good to remove that escape hatch of 5 SIGINT.

The idea here is that in a large number of cases, a forced interrupt would be preferable to killing the session, particularly in Julia due to load times / TTFX.

Again, this forced interrupt is not generally a safe way of doing anything - it forcibly kills whatever happened, which may leave a threaded workload in an inconsistent state. I don't think it should be required to be used often.

Regarding TTFX - I think you're going to be pleasently surprised with the upcoming 1.9 :)

Python interrupts an infinite loop with a single ctrl-C. Of course, this seems very tricky in general given Julia does not have the luxury of sacrificing performance, and definitely beyond my expertise, but I would like to at least raise the question:)

it would likely cost nothing to yield() given the cost of f.

Given that this depends on what f is doing, no, inserting yield() points in every loop iteration is not an option. This will prevent autovectorization, force costly rescheduling of the current task even if no other task is running, evict caches and all those things we really don't want to do in a high performance scenario (around which julia is built - else we'd all just use python).

If you need/want the guarantee to interrupt in a reasonable & deterministic timeframe, explicitly insert yield() points into your outermost (i.e., not performance critical) loop (or explicitly try/catch an InterruptException. Of course, it still depends on the tooling you're using to run julia to correctly propagate the interrupt to your code - that's not something julia itself can do anything about though.

gaurav-arya commented 1 year ago

It's not always ergonomic to manually insert yield()'s into the outermost loop, e.g. if that loop is part of library code. But it does sounds like it's an unavoidable performance-usability tradeoff then. Given that forced interrupts and kernel kills are both non-ideal, new users who end up in an infinite loop-type situation are left in a difficult position. There also looks like a bit of a knowledge gap to close: as an example, a user used to interactive Python might write an empty infinite loop, to e.g. demonstrate it to their students, and expect to ctrl-C it without killing the kernel. But perhaps there are no good solutions here:)

eschnett commented 4 months ago

My expectation is that CTRL-C will always abort running Julia code, REPL or not. As a user I don't care about "forced" vs. "not forced". I'd be happy to wait a bit (less than a second?) for the code to stop if there is no yield called, but I'd still expect the code to stop running fairly quickly when I want it to stop.