Closed eliasbrange closed 6 years ago
Initially I saw how this aligns with some of the language in the Consul docs on Initial Health Check Status.
By default, when checks are registered against a Consul agent, the state is set immediately to "critical". This is useful to prevent services from being registered as "passing" and entering the service pool before they are confirmed to be healthy.
I'm not 100% positive if that is compatible with consumers of ContainerPilot today. I'll have to double check but there might be Autopilot Pattern applications that draw a distinction between a service that is unregistered (and now would be marked as initially critical) and a service that is registered, healthy, then made critical. Things like fail over could be signaled from a service being marked as critical. Under this change any service that has not been brought up healthy at all would signal an improper fail over.
This could be left up to the consumer of ContainerPilot so it's possible we could stick this behind an optional setting.
Thoughts?
Yeah this might have side effects for applications that differ between unregistered and critical services as you said. One way could be to either use another status such as warning
. Or an optional setting that could either be made "global" or job-specific. Job configuration could for example have
{
name: 'job',
health: { ... },
register_critical: True // Defaults to False
}
This way it should not break any existing applications.
I think you're on the right track to create a flag for this behavior, but I have to admit I'm concerned about register_critical
. I'm concerned about how users interpret the critical
bit.
Whatever the name, the actual intent is that the service be registered as soon as the container starts, before the service is tested as healthy. I don't have any suggestions now, but I'm trying to think of other ways to describe that.
register_unhealthy
or register_failed
a better name?
That's true. The registration could happen elsewhere. It must happen after the consul agent is up and running though, so it probably has to happen in some of the processEvent
methods.
Both Consul and Nomad call a similar (if not identical) feature initial_status
. I don't think that naming would hurt here but interested in how that sounds.
On where to configure, I would agree that since it's tied to a job's registration it should be on the job itself. Any job can be a service and have an initial status value, or not. I think that plays nicely into this.
I think the initial_status
suggestion makes sense. I guess the docs would read something like this?
initial_status
(optional): if provided, ContainerPilot will immediately register the application before the first successful health check with this status.
I'm using both "immediately" and "before the first..." there for emphasis about the difference against the current behavior (and the behavior if the option is unset).
I like the initial_status
idea. However, I do not know the best place to call register
. If consul isn't started yet the registration will fail, and a retry will need to be done somewhere.
Also, initialStatus
instead of initial_status
to keep it in line with other config names?
Correct, to keep parity across systems and configuration.
My thoughts for the usage of initial_status
would be to allow the manual configuration of the initial status value itself along with making this optional in ContainerPilot.
null
. If the value is null
or not configured at all than we would not register (critical
) and continue the functionality that exists today.All others options would pass directly through the CP configuration into the Consul registration under health check. These values could include any of the values that Consul accepts including critical
, warning
, or even passing
(which would be api.HealthPassing
that we perform today).
The configuration could live under the job's health check. Here's an example...
{
consul: "localhost:8500",
jobs: [
{
name: "nginx",
port: 80,
interfaces: ["eth0"],
exec: "nginx",
health: {
exec: "health-check http",
interval: 10,
ttl: 25,
initial_status: "critical",
}
],
}
Am I conflating things or does that align with your use case? @eliasbrange
@misterbisson I think your documentation suggestion stands and we would just note the values that we can set through the Consul API.
initial_status
(optional): if provided, ContainerPilot will immediately register the application before the first successful health check with this status. Acceptable statuses includecritical
,warning
, orpassing
.
I've just pushed some updates that you could look at. I added the config under jobs
but it could just as well be moved to jobs.health
. That might actually make more sense.
Haven't had time yet to review latest changes but I'll try to fit it in next week.
Let me know if you'd rather have the configuration under jobs.health
.
I believe we were all set on that PR so I went ahead and merged. I'll wait till next week to get the minor release out.
Howdy, we currently have about 200 services in consul, and filtering them on only failed ones is a quick way for us to see that everything is up and running. However, containerpilot only registers a service after it has become healthy for the first time due to
job.sendHeartbeat()
being called inonHealthCheckPassed
.If for some reason the service never gets healthy, the failed filter in consul does not give us that information. Currently, we have to manually search for the service that usually misbehave on startup.
It would be great to register a service as critical in consul in
onHealthCheckPassed
, but only if the services hasn't been seen before to not mess with the TTL.