Ability to pass SIGUSR1 & SIGHUP to main process

javabean commented 8 years ago

Unless I am confused, there is currently no way to pass SIGUSR1 & SIGHUP to the main process (SIGUSR1 being used to put the service node in maintenance mode, and SIGHUP to reload ContainerPilot's configuration).

Could we please have an optional configuration to disable this, and pass all signals to the main application? (in which case one can still able to enable maintenance mode with a docker exec xxx kill -SIGUSR1 1)

tgross commented 8 years ago

Unless I am confused, there is currently no way to pass SIGUSR1 & SIGHUP to the main process

Right, not from outside the container. You'd need to use docker exec pkill to send those signals or include them in an onChange handler. The convention for SIGHUP is to do a config reload which you'd normally want to do in an onChange or task handler, but I know for example HAProxy uses it to dump status (which you'd hit w/ a sensor handler instead). Is there a specific use case you have in mind here that I'm not considering?

javabean commented 8 years ago

I think we are considering ContainerPilot (CP) from a different point of view. My vision is that CP has infrastructure qualities, and is not part of my application. It should therefore stay transparent. When sending an UNIX signal via Docker, I mean to interact with my application, not CP, hence the need to forward all signals from CP to the application. (We can't presume not a single application on earth will need SIGUSR1 and SIGHUP for useful purposes!)

That said, putting the service node in maintenance mode can be frightfully needed in some situations (e.g. draining currently executing long requests before shutting down the container).

I can't think of any other solution than to putting into CP configuration which (if any) signal should mean "maintenance mode" and "configuration reload" (with current default values; empty signal == disable the functionality).

I do not have a specific use case in mind, but I've seen such diverse behaviors in my professional life that I'm really looking to use flexible solutions. I know I'll have to send one of those signals one day, and I want to be ready when that's the case. CP fits my current needs, this is why I'm pushing for it! :-)

tgross commented 8 years ago

The thing is that if you're sending that signal to do a reconfiguration then you already have either a) entered the container to replace a configuration file, or b) have just triggered one of the event hooks. Otherwise you're HUP'ing the application w/o any changes for it to pick up. So why not take advantage of those options?

I'm trying to avoid configuration sprawl here. "One more config flag" is a genuine usability problem and ContainerPilot already has a lot of configuration options so each new one we add has that much larger of a hurdle to overcome.

javabean commented 8 years ago

Very true, let's try to keep CP configuration is simple as possible and options to a minimum.

If we consider that reloading CP configuration (SIGHUP) is an idempotent operation, I would be more than happy if this signal is then simply transmitted back to the application (no additional CP config flag). This way, if the application decides to do something else than reload, it will be fine (in which case we can simply restart the container for CP configuration changes).

I have more strong feelings about SIGUSR1 which really has no standard behavior. Hijacking it for service maintenance mode is a noble cause, but that prevents from easily sending it to the application. This makes me nervous.

tgross commented 8 years ago

I would be more than happy if this signal is then simply transmitted back to the application

If the application has no SIGHUP handler then it'll crash. So we really can't always re-transmit that either.

I have more strong feelings about SIGUSR1 which really has no standard behavior. Hijacking it for service maintenance mode is a noble cause, but that prevents from easily sending it to the application.

Ok but think about the combinations here:

ContainerPilot accepts SIGUSR1 and stops is propagation (current behavior)
ContainerPilot ignores SIGUSR1 and stops its propagation
ContainerPilot accepts SIGUSR1 and re-transmits to the application
ContainerPilot ignores SIGUSR1 and re-transmits to the application

What do we then expect the behavior is for all other subprocesses? i.e. the 7 user-defined hooks. Should they also receive the SIGUSR1? In all 4 of the configuration combinations described above?

I've suggested workarounds using the user-defined hooks ("So why not take advantage of those options?", above). Was there a use case in transmitting these signals that prevents us from using the existing hooks as I've asked here?

javabean commented 8 years ago

This is getting hairy… :-)

The main "problem" here is that we have a single canal of communication (UNIX signals) for 2 different purposes:

sending signals to the main application
controlling the "infrastructure" (CP)

If we have to extend CP's commands panel, I would suggest we use the already-existing HTTP server (used for telemetry) to send orders to CP.

In the meantime, as you previously exposed, and unless we change configuration options, it should be documented that all signals are passed-throu except SIGHUP and SIGUSR1. Let's put this into task #197!

misterbisson commented 8 years ago

@javabean can you say more about why you think of ContainerPilot as an infrastructure component? We're developing it and I certainly think of it as part of the application. That question is probably part of the confusion here, but I'd love to learn more about it.

javabean commented 8 years ago

@misterbisson very simple: I see CP having the same kind of functions as Docker: orchestrate, manage and handle my applications, by opposition to servicing user requests. The services offered by CP could be integrated into Docker without looking shocking.

IMHO the frontier between infrastructure and applications is in the code we write to bind them together (e.g. healthchecks, onChange handlers), and which could be considered both as part of the application, and/or part of infrastructure.

tgross commented 8 years ago

@misterbisson @javabean I honestly think trying to ascertain philosophical purity for an application like ContainerPilot which is really designed to fill the cracks in application/infra behaviors is not terribly constructive.

The main "problem" here is that we have a single canal of communication (UNIX signals) for 2 different purposes

Let's start with the understanding that without reaching into the container process space (via docker exec), application containers can only handle signals at PID1. The intended level of abstraction for application containers is that the container can be treated as it were a single application, even if it is not in practice. Mega-orchestrators like Marathon/Mesos add difficulty to this by having PID1 be /bin/sh, so this problem isn't unique to ContainerPilot.

So the only things under consideration really are (in order):

Can the desired behavior be handled by existing lifecycle hooks? You still haven't provided a specific use case @javabean so until we have one it's going to be hard to answer this.
If not, should we pass-thru signals to subprocesses even though this behavior is unusual/unexpected?
If so, what should the configuration look like?

If we have to extend CP's commands panel, I would suggest we use the already-existing HTTP server (used for telemetry) to send orders to CP.

Handling HTTP POST expands the ContainerPilot surface area considerably. How would we authenticate and authorize these requests? I would very much like to avoid this kind of expansion of scope.

it should be documented that all signals are passed-throu except SIGHUP and SIGUSR1.

That's not accurate. Unhandled signals are not passed-thru but instead hit ContainerPilot, which generally means killing the container as it does with any other process that receives an unhandled signal.

tgross commented 7 years ago

https://github.com/joyent/containerpilot/issues/244 has been opened as a proposal to cover this issue as a 3.x enhancement.

tgross commented 7 years ago

I'm going to close this issue as #244 really supersedes this concept entirely. Refer to RFD86 and the v3 roadmap in #283 for details.

TritonDataCenter / containerpilot

Ability to pass SIGUSR1 & SIGHUP to main process #195