Open osher opened 7 years ago
I excitedly read all the thread, ending with the greatest facepalm I had in the past years.
Look, this is not a solution!
This is not a good way to start a bug report. You've now established yourself as an adversary rather than a collaborator.
Something to keep in mind for future bug reports. Tone matters.
I'm sorry, but we need more information to understand what you are asking for on the Go side. I did read nodist/issues#179, but there is not a line of Go code in sight.
You also mentioned reopening #6720, which as I read it is about p.Signal(os.Interrupt), where p is an os.Process, not being implemented and therefore always returning an error on Windows. That issue was closed by 05cc78d8, which did:
$ git show 05cc78d
commit 05cc78d8d32f6af6fc4373e10da0b4a12f0a1ad4
Author: Alex Brainman <alex.brainman@gmail.com>
AuthorDate: Fri May 23 12:29:29 2014 +1000
Commit: Alex Brainman <alex.brainman@gmail.com>
CommitDate: Fri May 23 12:29:29 2014 +1000
os: document that Interrupt might not work on every os
Fixes #6720.
LGTM=bradfitz
R=golang-codereviews, iant, bradfitz
CC=golang-codereviews
https://golang.org/cl/92340043
diff --git a/src/pkg/os/doc.go b/src/pkg/os/doc.go
index bc700b6..389a8eb 100644
--- a/src/pkg/os/doc.go
+++ b/src/pkg/os/doc.go
@@ -46,6 +46,7 @@ func (p *Process) Wait() (*ProcessState, error) {
}
// Signal sends a signal to the Process.
+// Sending Interrupt on Windows is not implemented.
func (p *Process) Signal(sig Signal) error {
return p.signal(sig)
}
$
As discussed in #6720, we don't know of an obvious way to implement p.Signal(os.Interrupt) on Windows, or we would have. But none of us are Windows experts. A few long time users, maybe, but not experts. We do the best we can by reading the Microsoft documentation and a liberal amount of Stack Overflow and trial and error.
One possibility is that nodejs ctx.child.kill('SIGTERM') is sending a Ctrl-Break to the entire process group, but if that were the case I don't understand why that wouldn't hit both the go wrapper and the nodejs server it started.
Another possibility is that nodejs ctx.child.kill('SIGTERM') knows some kind of magic to send a Ctrl-Break to just that one process. If so and you can tell us what that magic is, we can probably implement it in Go.
Another possibility is that SIGTERM doesn't mean Ctrl-Break at all here. But then what does it mean?
I looked in github.com/nodejs/node and there is no mention of what SIGTERM means on Windows. They must be using the Microsoft C runtime library, but what does that do? I looked in the mingw sources and the closest I found was mingw/include/signal.h, which says:
/*
* The actual signal values. Using other values with signal
* produces a SIG_ERR return value.
*
* NOTE: SIGINT is produced when the user presses Ctrl-C.
* SIGILL has not been tested.
* SIGFPE doesn't seem to work?
* SIGSEGV does not catch writing to a NULL pointer (that shuts down
* your app; can you say "segmentation violation core dump"?).
* SIGTERM comes from what kind of termination request exactly?
* SIGBREAK is indeed produced by pressing Ctrl-Break.
* SIGABRT is produced by calling abort.
* TODO: The above results may be related to not installing an appropriate
* structured exception handling frame. Results may be better if I ever
* manage to get the SEH stuff down.
*/
"SIGTERM comes from what kind of termination request exactly?". Exactly.
It sounds like you maybe you know what Go should be doing instead, or maybe what SIGTERM means on Windows. If so, can you tell us? Thanks.
It looks like Node uses libuv, which treats SIGKILL, SIGTERM, and SIGINT the same on windows: https://github.com/nodejs/node/blob/db1087c9757c31a82c50a1eba368d8cba95b57d0/deps/uv/src/win/process.c#L1166
@pbnjay, thanks for finding that.
To recap, there is a Node parent that ran a Go process that ran a Node child.
On Unix systems, if the Node parent sends SIGTERM ("please stop") to the Go process, then the Go process's signal handler runs and can do something in response to the signal, like send SIGTERM to the Node child, wait for the Node child to exit gracefully, and then exit itself.
On Windows, my reading of what @pbnjay found is that the Node parent calls TerminateProcess on the Go process. That doesn't send a nice "please stop" to the Go process. It just terminates it, like Unix SIGKILL. There is no signal sent, no time to react; the operating system just destroys the process. In this case the Node child is left behind. It would have to be: the Go process had no chance to do anything.
On Linux, there is still a way to cope with SIGKILL. When the Go program starts the subprocess, it can pass a SysProcAttr with Pdeathsig!=0, which makes the forked child call prctl(PR_SET_PDEATHSIG, Pdeathsig) before exec'ing the actual new program. That setting means "if my parent dies, send me this signal", so that even if the Go program dies with kill -9 or some other path that forgets to do cleanup, the child can be notified that the parent is gone and clean up after itself.
It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects". It is unclear to me whether this still works in current versions of Windows, but some way to support that would be the obvious next thing to try. I'm going to retitle this bug to be about that. https://msdn.microsoft.com/en-us/library/ms682409(VS.85).aspx http://stackoverflow.com/questions/53208/how-do-i-automatically-destroy-child-processes-in-windows
I'm away from keyboard until Sunday, however - I wanted to say I appreciate the seriousity and professionalism the thread gets, despite the aforementioned ...tone.
I wish I had more lower level info - I would have shared it at start.
I can be just a bit more elaborate, I will next week.
Here's the picture, I hope it helps a little.
In case it implies on your priorities - I owe you an update.
Specifically for my case a workaround has been found:
Instead of passing through the Nodist go shim and resolving the paths the usual way - I use a nodejs api to invoke the spawned server on the same executable as the current process that runs the test-suite (the API is called process.execPath
).
This results in removing the 3rd and 2nd last steps, jumping streight to the last.
In this case, the SIGTERM terminates the server and the system works as expected without leaving hung processes.
However, we're still get hanging processes whenever I Ctrl+C
to the main process.
Mind that my actual use-case is complicated by one more level
(generator-test-suite --> generated-project-test-suite --> generated-project-server
instead of
project-tet-siote-->project server)
FYI.
FYI, AssignProcessToJobObject fail on Windows7. AFAIK, it have to terminate process with walking children processes using CreateToolhelp32Snapshot on Windows7 or older. One another issue, as alex said in #6720, GenerateConsoleCtrlEvent have another problem. The API require "console". So if the process doesn't have a console, it doesn't work. For example, if the process call AllocConsole, it works fine. Below is a implementation using GenerateConsoleCtrlEvent & CREATE_NEW_PROCESS_GROUP . https://github.com/mattn/goemon/blob/master/proc_windows.go
It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects".
I do not know about PR_SET_PDEATHSIG, but you can use "job objects" to control process groups on Windows. I even have github.com/alexbrainman/ps package with some APIs. We have used "job objects" to collect benchmark run statistics in golang.org/x/benchmarks/driver. From what I remember "job objects" provide facilities for child processes to start their own group too, so you would need some cooperation from your clients.
It is unclear to me whether this still works in current versions of Windows
I think it works on all Go supported Windows versions.
Alex
/cc @johnsonj
I would like to reopen #6720. I excitedly read all the thread, ending with the greatest facepalm I had in the past years.
Look, this is not a solution! If
nodejs
knows how to pass these signals between processes on windows - then no reason thatgolang
should not. No magic involved.Full story bellow. Mind that if the
go
shim is out of the loop - it works as expected, so I deduct it has to be something thatgolang
does wrong.I would appreciate your input on this...
What version of Go are you using (
go version
)?donno. the one that
nodist@0.8.7
usesWhat operating system and processor architecture are you using (
go env
)?Windows 10, x64
What did you do?
I'm using
nodist
which allows me to run different versions of nodejs side by side.nodist
uses a shim layer ofgo
to have a look around on the env vars and local folder to detect the desired node version and call the relevant executable accordinglySee here: https://github.com/marcelklehr/nodist/issues/179
What did you expect to see?
https://github.com/marcelklehr/nodist/issues/179
What did you see instead?
https://github.com/marcelklehr/nodist/issues/179