cli: `cockroach quit` should wait until the server has effectively stopped

knz commented 8 years ago

Scripts like e.g. the Jepsen tests will start and then restart a server. Now if the following commands are issued quickly in sequence:

cockroach start
cockroach quit
cockroach start

then the 3rd sometimes fails with an error "the server is already started". This is because "quit" merely issues a termination request and terminates (and gives control back to the shell) before the db server has effectively terminated.

Script writers need a way to synchronize on the termination of the db server. The easiest way forward would be to have "quit" wait until the shutdown request has been processed and the connection to the server is lost.

Reported by @dationl.

tamird commented 8 years ago

I believe this was fixed by #7483.

tbg commented 8 years ago

No, not fixed yet. Technically quit only waits until the request is processed, but the process might still take a little bit of time to actually shut down (it would be safer to poll the port and wait for it to close, but that isn't really that idiomatic as well). I wonder how other software does this.

tamird commented 8 years ago

Could make quit a streaming RPC and wait for the stream to close?

tbg commented 8 years ago

But the stream would always close slightly before the process (pid) disappears, no?

On Fri, Jul 1, 2016 at 10:28 PM Tamir Duberstein notifications@github.com wrote:

Could make quit a streaming RPC and wait for the stream to close?

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/6585#issuecomment-230079013, or mute the thread https://github.com/notifications/unsubscribe/AE135OYXOJZ6RmJVXA14ADv4qzvgQ10bks5qRczSgaJpZM4IaThe .

-- Tobias

tamird commented 8 years ago

Yeah, but that's the best we're ever going to do, right?

knz commented 8 years ago

No that's not the best. You can also use the syscall kill(2) with SIGCONT or some other innocuous signal until your kernel tells you the process does not exist any more. That's the best way.

tamird commented 8 years ago

Doesn't that assume that the cli process is on the same machine as the server?

knz commented 8 years ago

Well ok if it isn't then what you said is the best. But it local control not the common case?

bdarnell commented 8 years ago

I wonder how other software does this.

Most software doesn't have an equivalent command (and I consider our quit command to generally be a bad idea). Servers are run under a process manager, and you shut them down by asking that process manager to stop them, not by asking the server to stop itself. (or running locally without a process manager, you just kill the process and poll for its PID to disappear)

Without a process manager, it's difficult to manage the full lifecycle precisely. You can either start with cockroach start & and use the shell's wait command at shutdown, (which means that the server may not be ready when cockroach start & returns) or you start with cockroach start --background and stop with cockroach quit, which gives you a precise startup notification but nothing at shutdown.

If Go supported fork, we could fork off a child process and respond to the quit command when the parent process was gone. Without fork, all our options are hacks on top of hacks and it's probably better to just encourage the use of a proper process manager.

knz commented 8 years ago

As I explain in https://github.com/cockroachdb/cockroach/pull/7603#issuecomment-230106282 you can also count on the server's OS to signal process termination by shutting down all open sockets. That is detectable over the network.

cockroachdb / cockroach

cli: `cockroach quit` should wait until the server has effectively stopped #6585