TritonDataCenter / containerpilot

A service for autodiscovery and configuration of applications running in containers
Mozilla Public License 2.0
1.12k stars 136 forks source link

Stop containerpilot with a custom exit code #526

Closed bborysenko closed 6 years ago

bborysenko commented 6 years ago

How does one stop containerpilot with a custom exit code? Something similar to #394, but take care for exit code of main job. In this case container always stops with exit code 0.

For example if nginx has been stoped with exit code 10, we would like that containerpilot stops itself with exactly same exit code. Or maybe at least with not exit code 0.

jwreagor commented 6 years ago

There isn't a way to do this today. There also isn't a way to signal what job/process is the "primary" and should share an exit code with the parent process.

It would help to understand the context a bit better. What's the use case for returning a custom exit code? How are you using that information after the container shuts down?

bborysenko commented 6 years ago

For example, we have some a process in container that is able to crash whenever it encounters an issue and would like to use restartPolicy: OnFailure.

misterbisson commented 6 years ago

This has been a frustration more than one person has expressed about the change from ContainerPilot 1 and 2, which were really just shims around a single executable with some lifecycle hooks, to ContainerPilot 3, which became e a full process manager for distributed applications.

What's clear is that though most users need to manage multiple executables in a single container, there's often still one "main" or "primary" executable in most people's minds.

jwreagor commented 6 years ago

Thinking about our options...

Say we add a flag to a job to assign the job's exit status for shutting down ContainerPilot as well. When the job exits, and all other jobs are complete (not running), only then can CP exit with the status of the job. That's not the only concern.

The next issue is that ContainerPilot jobs execute within a child worker of a supervising parent process if ContainerPilot is running as PID 1. This is most likely true within a container. I believe that would mean we'd need to propagate the exit status up to the parent process.

Finally, another way to solve this would be to provide a special shutdown endpoint on the control server. Passing the exit status integer as a parameter for endpoint requests could initiate a shut down of CP. Much like the reload endpoint, this would be served at the control socket which is good for security reasons (local) but in the future that could be hooked to a TCP port. This option has the same problem of propagating the exit status back up to the parent process.

Unless I'm not thinking of something, it sounds like we'd have to solve the parent IPC problem either way.

misterbisson commented 6 years ago

Unless I'm not thinking of something, it sounds like we'd have to solve the parent IPC problem either way.

Agreed.

To clarify my understanding of another point (and not to indicate a commitment to building something), the "special shutdown endpoint on the control server" would be something the CP child worker running that "main" would interact with after it receives the exit code? You're not expecting that end-users would have to explicitly interact with it in any way, correct?

jwreagor commented 6 years ago

Actually, in that description, I did mean for the user to initiate the shutdown through the socket. I mentioned that mostly as a demonstration that it's not how we capture the exit status but how we communicate the exit status to the parent.

misterbisson commented 6 years ago

it's not how we capture the exit status but how we communicate the exit status to the parent

Interesting point, and it opens up some interesting ideas in the context of your recent work to support events based on a signal in https://github.com/joyent/containerpilot/pull/529 and https://github.com/joyent/containerpilot/pull/520.

jwreagor commented 6 years ago

Doing a little ticket triage and, while this is a feature I wish I had an easy solution for, I feel like we should punt on this for the time being. For now I'm going to close this and hope that we can revisit it at some point in the future.