golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.24k stars 17.57k forks source link

syscall: add support for Windows job objects #17608

Open osher opened 7 years ago

osher commented 7 years ago

I would like to reopen #6720. I excitedly read all the thread, ending with the greatest facepalm I had in the past years.

Look, this is not a solution! If nodejs knows how to pass these signals between processes on windows - then no reason that golang should not. No magic involved.

Full story bellow. Mind that if the go shim is out of the loop - it works as expected, so I deduct it has to be something that golang does wrong.

I would appreciate your input on this...

What version of Go are you using (go version)?

donno. the one that nodist@0.8.7 uses

What operating system and processor architecture are you using (go env)?

Windows 10, x64

What did you do?

I'm using nodist which allows me to run different versions of nodejs side by side. nodist uses a shim layer of go to have a look around on the env vars and local folder to detect the desired node version and call the relevant executable accordingly

See here: https://github.com/marcelklehr/nodist/issues/179

What did you expect to see?

https://github.com/marcelklehr/nodist/issues/179

What did you see instead?

https://github.com/marcelklehr/nodist/issues/179

bradfitz commented 7 years ago

I excitedly read all the thread, ending with the greatest facepalm I had in the past years.

Look, this is not a solution!

This is not a good way to start a bug report. You've now established yourself as an adversary rather than a collaborator.

Something to keep in mind for future bug reports. Tone matters.

rsc commented 7 years ago

I'm sorry, but we need more information to understand what you are asking for on the Go side. I did read nodist/issues#179, but there is not a line of Go code in sight.

You also mentioned reopening #6720, which as I read it is about p.Signal(os.Interrupt), where p is an os.Process, not being implemented and therefore always returning an error on Windows. That issue was closed by 05cc78d8, which did:

$ git show 05cc78d
commit 05cc78d8d32f6af6fc4373e10da0b4a12f0a1ad4
Author:     Alex Brainman <alex.brainman@gmail.com>
AuthorDate: Fri May 23 12:29:29 2014 +1000
Commit:     Alex Brainman <alex.brainman@gmail.com>
CommitDate: Fri May 23 12:29:29 2014 +1000

    os: document that Interrupt might not work on every os

    Fixes #6720.

    LGTM=bradfitz
    R=golang-codereviews, iant, bradfitz
    CC=golang-codereviews
    https://golang.org/cl/92340043

diff --git a/src/pkg/os/doc.go b/src/pkg/os/doc.go
index bc700b6..389a8eb 100644
--- a/src/pkg/os/doc.go
+++ b/src/pkg/os/doc.go
@@ -46,6 +46,7 @@ func (p *Process) Wait() (*ProcessState, error) {
 }

 // Signal sends a signal to the Process.
+// Sending Interrupt on Windows is not implemented.
 func (p *Process) Signal(sig Signal) error {
    return p.signal(sig)
 }
$ 

As discussed in #6720, we don't know of an obvious way to implement p.Signal(os.Interrupt) on Windows, or we would have. But none of us are Windows experts. A few long time users, maybe, but not experts. We do the best we can by reading the Microsoft documentation and a liberal amount of Stack Overflow and trial and error.

One possibility is that nodejs ctx.child.kill('SIGTERM') is sending a Ctrl-Break to the entire process group, but if that were the case I don't understand why that wouldn't hit both the go wrapper and the nodejs server it started.

Another possibility is that nodejs ctx.child.kill('SIGTERM') knows some kind of magic to send a Ctrl-Break to just that one process. If so and you can tell us what that magic is, we can probably implement it in Go.

Another possibility is that SIGTERM doesn't mean Ctrl-Break at all here. But then what does it mean?

I looked in github.com/nodejs/node and there is no mention of what SIGTERM means on Windows. They must be using the Microsoft C runtime library, but what does that do? I looked in the mingw sources and the closest I found was mingw/include/signal.h, which says:

/*
 * The actual signal values. Using other values with signal
 * produces a SIG_ERR return value.
 *
 * NOTE: SIGINT is produced when the user presses Ctrl-C.
 *       SIGILL has not been tested.
 *       SIGFPE doesn't seem to work?
 *       SIGSEGV does not catch writing to a NULL pointer (that shuts down
 *               your app; can you say "segmentation violation core dump"?).
 *       SIGTERM comes from what kind of termination request exactly?
 *       SIGBREAK is indeed produced by pressing Ctrl-Break.
 *       SIGABRT is produced by calling abort.
 * TODO: The above results may be related to not installing an appropriate
 *       structured exception handling frame. Results may be better if I ever
 *       manage to get the SEH stuff down.
 */

"SIGTERM comes from what kind of termination request exactly?". Exactly.

It sounds like you maybe you know what Go should be doing instead, or maybe what SIGTERM means on Windows. If so, can you tell us? Thanks.

pbnjay commented 7 years ago

It looks like Node uses libuv, which treats SIGKILL, SIGTERM, and SIGINT the same on windows: https://github.com/nodejs/node/blob/db1087c9757c31a82c50a1eba368d8cba95b57d0/deps/uv/src/win/process.c#L1166

rsc commented 7 years ago

@pbnjay, thanks for finding that.

To recap, there is a Node parent that ran a Go process that ran a Node child.

On Unix systems, if the Node parent sends SIGTERM ("please stop") to the Go process, then the Go process's signal handler runs and can do something in response to the signal, like send SIGTERM to the Node child, wait for the Node child to exit gracefully, and then exit itself.

On Windows, my reading of what @pbnjay found is that the Node parent calls TerminateProcess on the Go process. That doesn't send a nice "please stop" to the Go process. It just terminates it, like Unix SIGKILL. There is no signal sent, no time to react; the operating system just destroys the process. In this case the Node child is left behind. It would have to be: the Go process had no chance to do anything.

On Linux, there is still a way to cope with SIGKILL. When the Go program starts the subprocess, it can pass a SysProcAttr with Pdeathsig!=0, which makes the forked child call prctl(PR_SET_PDEATHSIG, Pdeathsig) before exec'ing the actual new program. That setting means "if my parent dies, send me this signal", so that even if the Go program dies with kill -9 or some other path that forgets to do cleanup, the child can be notified that the parent is gone and clean up after itself.

It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects". It is unclear to me whether this still works in current versions of Windows, but some way to support that would be the obvious next thing to try. I'm going to retitle this bug to be about that. https://msdn.microsoft.com/en-us/library/ms682409(VS.85).aspx http://stackoverflow.com/questions/53208/how-do-i-automatically-destroy-child-processes-in-windows

osher commented 7 years ago

I'm away from keyboard until Sunday, however - I wanted to say I appreciate the seriousity and professionalism the thread gets, despite the aforementioned ...tone.

I wish I had more lower level info - I would have shared it at start.

I can be just a bit more elaborate, I will next week.

osher commented 7 years ago

Here's the picture, I hope it helps a little. go nodist

osher commented 7 years ago

In case it implies on your priorities - I owe you an update.

Specifically for my case a workaround has been found: Instead of passing through the Nodist go shim and resolving the paths the usual way - I use a nodejs api to invoke the spawned server on the same executable as the current process that runs the test-suite (the API is called process.execPath).

This results in removing the 3rd and 2nd last steps, jumping streight to the last.

In this case, the SIGTERM terminates the server and the system works as expected without leaving hung processes.

However, we're still get hanging processes whenever I Ctrl+C to the main process. Mind that my actual use-case is complicated by one more level (generator-test-suite --> generated-project-test-suite --> generated-project-server instead of project-tet-siote-->project server)

FYI.

mattn commented 7 years ago

FYI, AssignProcessToJobObject fail on Windows7. AFAIK, it have to terminate process with walking children processes using CreateToolhelp32Snapshot on Windows7 or older. One another issue, as alex said in #6720, GenerateConsoleCtrlEvent have another problem. The API require "console". So if the process doesn't have a console, it doesn't work. For example, if the process call AllocConsole, it works fine. Below is a implementation using GenerateConsoleCtrlEvent & CREATE_NEW_PROCESS_GROUP . https://github.com/mattn/goemon/blob/master/proc_windows.go

alexbrainman commented 7 years ago

It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects".

I do not know about PR_SET_PDEATHSIG, but you can use "job objects" to control process groups on Windows. I even have github.com/alexbrainman/ps package with some APIs. We have used "job objects" to collect benchmark run statistics in golang.org/x/benchmarks/driver. From what I remember "job objects" provide facilities for child processes to start their own group too, so you would need some cooperation from your clients.

It is unclear to me whether this still works in current versions of Windows

I think it works on all Go supported Windows versions.

Alex

bradfitz commented 7 years ago

/cc @johnsonj