Coq doesn't call check_for_interrupt often enough, and coq-lsp becomes unresponsive

ejgallego / coq-lsp

Visual Studio Code Extension and Language Server Protocol for Coq

GNU Lesser General Public License v2.1

151 stars 31 forks source link

Coq doesn't call check_for_interrupt often enough, and coq-lsp becomes unresponsive #139

Open Alizter opened 1 year ago

Alizter commented 1 year ago

Coq-lsp should terminate when the buffer is closed. Otherwise it stays open and has to be restarted manually.

ejgallego commented 1 year ago

coq-lsp server lifetime follows the LSP specification, it is the client who controls the lifetime; do you have a more concrete example?

Alizter commented 1 year ago

Open T2048.v and then close the buffer. Coq lsp still runs for a bit.

ejgallego commented 1 year ago

Ok will try; actually the most likely problem is that T2048.v gets deep into the kernel to check the Qed, such deep conversion loops don't do check_for_interrupt often so there you have your problem: the client sends the shutdown message properly, however the queue is not processed.

So if my guess is correct, the problem is more general, and the issue could be retitled as "Coq doesn't call check_for_interrupt often enough, and coq-lsp becomes unresponsive", and that's a kind of upstream issue.

Recall that at first I was skeptical that check_for_interrupt would work well enough, but actually it has!

We've been discussing with OCaml devs how to do this better (and have a check on allocation), but the tradeoffs here are complex.

I think for now we just need to try to gather all the cases where coq-lsp gets stuck, maybe just an extra check in Coq could solve this without an invasive signal-handling patch.

Alizter commented 1 year ago

So if my guess is correct, the problem is more general, and the issue could be retitled as "Coq doesn't call check_for_interrupt often enough, and coq-lsp becomes unresponsive", and that's a kind of upstream issue.

Yes, that is a better title. Updated.

Recall that at first I was skeptical that check_for_interrupt would work well enough, but actually it has!

This is good news. Let us try to make better use of it then.

We've been discussing with OCaml devs how to do this better (and have a check on allocation), but the tradeoffs here are complex.

Do you mean: https://github.com/ocaml/ocaml/issues/11411?

I think for now we just need to try to gather all the cases where coq-lsp gets stuck, maybe just an extra check in Coq could solve this without an invasive signal-handling patch.

Let's definitely try that first.

ejgallego commented 1 year ago

So indeed after the wait you have confirmed that coq-lsp is shutdown normally right?

Do you mean: https://github.com/ocaml/ocaml/issues/11411?

Yes, that was the start of the discussion.

You could use the profiler to see where the inner loop is here, and patch with a check_for_interrupt. But beware the perf hit maybe be large.

ejgallego commented 1 year ago

I did some more research on this example, the problem seems at:

#3956 0x000056489b651ac7 in camlDeclare__to_constr_body_6449 () at vendor/coq/vernac/declare.ml:1623
#3957 0x000056489b651bf7 in camlDeclare__fun_10409 () at vendor/coq/vernac/declare.ml:1648

Note the stack trace at almost 4000 deep! Another bad spot:

#5604 0x0000563d779cf526 in camlTypeops__execute_2721 () at vendor/coq/kernel/typeops.ml:615
#5605 0x0000563d779d11f3 in camlTypeops__infer_2985 () at vendor/coq/kernel/typeops.ml:806

At one point I saw this at more than 8000 depth!

So indeed the Qed here is very expensive and a few operations take most of the time; not sure we can do better with the current tech without incurring in an slowdown as these Qed paths are really sensitive.

cc #137