jgillis01 commented 8 years ago

Is there a way for containerpilot to register a container as its own node in consul? Currently, containerpilot registers a service against a particular node in consul (given a clustered setting). If that consul node becomes unavailable, we lose service health visibility until that consul node is restored.

nmarshall-cst commented 8 years ago

+1 For me this occurs when Triton containers are removed or fail. I will also see multiple containers on the same node within the consul UI.

ghost commented 8 years ago

Few days ago, I made some comtainer that include both the containerpilot and consul.

If you want register the container as a consul node, please try it.

In the dockerfile, entrypoint should contain some shell script that include running consul node for containerpilot's argument.

And the consul will execute itself as a foreground, so you can't execute some apllication what you want. So, you can use nohup for consul.

tgross commented 8 years ago

@jgillis01 someone at a asked me about this offline recently and I'm going to admit I'm a little unclear as to what we're doing wrong here. But obviously the behavior you're describing isn't ideal. We're currently using Agent.ServiceRegister with the Consul API client, which corresponds to /v1/service/agent/register.

Should we be using /v1/catalog/register (or in other words Catalog.Register)? The Consul documentation suggests otherwise but maybe I'm misreading it.

Note: it is usually preferrable instead to use the agent endpoints for registration as they are simpler and perform anti-entropy.

Do you have a minimal test case I could use to exercise the bad behavior so we can verify this?

jgillis01 commented 8 years ago

@tgross,

The documentation seems to suggest that the agent endpoint is used to interact with a consul agent running locally on the node:

The Agent endpoints are used to interact with the local Consul agent.

I agree the catalog endpoint should not be used to register services due to the health check restrictions:

If the Check key is provided, a health check will also be registered. Note: this register API manipulates the health check entry in the Catalog, but it does not setup the script, TTL, or HTTP check to monitor the node's health. To truly enable a new health check, the check must either be provided in agent configuration or set via the agent endpoint.

The following steps can be used to reproduce the behavior I am experiencing:

Create a consul cluster with 4 nodes (-bootstrap-expect 3).
Run a service container with a containerpilot configuration pointing to the first consul node.
Stop the first consul node.

Navigating to the consul ui on one of the other consul nodes, I notice the first consul node state is failing the serf health check, and the service health check still reports the last known health of the service. At this point, changes to the service container state are not seen by consul since containerpilot is trying to communicate (report health check results) with a non-existent consul node.

tgross commented 8 years ago

@jgillis01 I want to dive into this but I'll be on the road the next couple days so it might take until later this week to get back to you. In the meantime, are you deploying on Triton or on another platform?

jgillis01 commented 8 years ago

@tgross I am deploying on Triton

zadunn commented 8 years ago

@tgross - I think I had harassed you at ContainerSummit Vegas about this. tl;dr It's a SPOF that we would like to stomp out. We are in Joyent PC as well as our own SDC install on our gear. We've a team of people ready to work the issue, I just want to make sure we are working together in the right direction.

misterbisson commented 8 years ago

@zadunn @jgillis01 how are you discovering Consul? Is it via a Docker link or CNS name? If ContainerPilot is connecting to the Consul cluster via CNS service name (I'm playing with apps connected to consul.svc.d42e7882-89d2-459e-bc0a-e9af0bca409c.us-sw-1.cns.joyent.com now), it will connect to any available Consul instance.

Doesn't that eliminate the SPOF issues?

yarmiganosca commented 8 years ago

@misterbisson right now we're injecting the ip of the consul node as an env var into all the containerpilot containers. From a code perspective, we could easily switch that to injecting a CNS name. I don't really understand networking protocols in any detail, so @zdunn or @jgillis01 will need to answer the latter question for you. Would containerpilot connecting to any instance mean that the service information containerpilot registered on one node got "consensus'd" to the others? Or does it mean that if the node stuff is registered on fails, it won't have a hard time finding another? And if the latter, is containerpilot set up to know what to do in that situation?

misterbisson commented 8 years ago

@yarmiganosca ah, yes. Linking by IP number or Docker links can be problematic. Take a look at this alternative that uses Triton Container Name Service (CNS):

We give Consul a label that names the CNS service, triton.cns.services=consul. Here it is in the Compose file:

https://github.com/autopilotpattern/wordpress/blob/master/docker-compose.yml#L80

We can then set an environment var that identifies the Consul service by DNS. This is the pattern for our public cloud: CONSUL=consul.svc.${TRITON_ACCOUNT}.${TRITON_DC}.cns.joyent.com (that name points at the private interface). Here's where we generate it:

https://github.com/autopilotpattern/wordpress/blob/master/setup.sh#L176

Learn more about Triton CNS:

https://www.joyent.com/blog/introducing-triton-container-name-service

yarmiganosca commented 8 years ago

@misterbisson we're not using Docker Networking links. We're just injecting the env var into the container at runtime and interpolating it into the containerpilot config file. As to my questions about the HA properties of your solution, do you have any further information?

zadunn commented 8 years ago

@misterbisson So our issue isn't so much of one at boot can we speak to consul. It's the fact that the services that are created by container pilot are pnned to a single consul node. So if that CONSUL node goes away, our service "fails". Which in the autopilot pattern is going to kick off a bunch of changes.

This is a lot different than the standard consul-template method of registering as a node in the cluster, then registering services onto yourself. With that pattern we are part of the cluster, and a consul server node failing doesn't cause a freak out (insert appropriate .wav here).

So we have two concerns:

boot time, which using DNS/CNS helps with a lot.
run time, which I am not sure we have a good answer for at this point. If we lose the consul node that we are registered on our service will be marked as unhealthy and removed from other services configurations.

mterron commented 8 years ago

EDIT: I think the obvious behaviour is to register a node per container and 1 service per port the container exposes. That way things will behave as expected, if the node goes down, all the associated services go down. If a consul node goes down, nothing should happen as the node information should be replicated. On a different note, you could also use Consul itself to resolve. This is what I've been doing, by setting CONSUL=https://consul.service.consul:8501 (When using a remote consul, I'd recommend using TLS and a token for authorization). This works once the consul cluster has quorum.

mterron commented 8 years ago

On reading the Consul documentation carefully, Consul expects to have local agents in each server/container/vm. I'm not sure there is a way to express the behaviour we want without running a consul agent in each container (which I did in my vault-containerpilot docker image). See: https://www.consul.io/docs/agent/basics.html

CONSUL AGENT

The Consul agent is the core process of Consul. The agent maintains membership information, registers services, runs checks, responds to queries, and more. The agent must run on every node that is part of a Consul cluster.

Any agent may run in one of two modes: client or server. A server node takes on the additional responsibility of being part of the consensus quorum. These nodes take part in Raft and provide strong consistency and availability in the case of failure. The higher burden on the server nodes means that usually they should be run on dedicated instances -- they are more resource intensive than a client node. Client nodes make up the majority of the cluster, and they are very lightweight as they interface with the server nodes for most operations and maintain very little state of their own.

mterron commented 8 years ago

What about registering all services in all Consul nodes? It is very hacky but will provide the desired resiliency.

tgross commented 8 years ago

I've had a chance to dig into this now and @zadunn has described the problem well at this point:

It's the fact that the services that are created by container pilot are pnned to a single consul node.

And I think @mterron has hit the root of the problem on the head:

Consul expects to have local agents in each server/container/vm. I'm not sure there is a way to express the behaviour we want without running a consul agent in each container

Although the data about the service is replicated via the raft to all nodes, Consul includes data about what agent you've registered with. So we need to come up with a workaround for that. I suspect but haven't proved yet that we can do this by using the catalog API directly. I'll be attempting this today and maybe try to reach out to some folks at Hashicorp to see what they would suggest, given that this is a non-standard topology for them.

@mterron wrote:

What about registering all services in all Consul nodes?

In the environment where this is a problem, we typically are going to be using CNS/DNS to reach the Consul cluster, so we probably don't have a good way to enumerate all the nodes.

zadunn commented 8 years ago

@jgillis01 is trying an experiment using containerpilot to start consul agent then have containerpilot register to it locally. The dirty part is that we are starting consul agent and whatever else we want with supervisord. It works but again it feels dirty as hell.

We would also have issue of managing of all the parts needed for consul agent. Anyway, I think the issue is clear now!

tgross commented 8 years ago

Yeah another similar approach could be if ContainerPilot was acting as a Consul client node instead of as a HTTP client (this distinction is a little weird but check out the docs for details). We'd get the same effect, but we'd have special-cased Consul behaviors to an extent I'm not comfortable with for portability's sake. I'm currently digging into catalog API and seeing whether we can live without anti-entropy in this use case.

Edit: we wouldn't lose anti-entropy entirely, as the agents would still run it periodically depending on cluster size.

jgillis01 commented 8 years ago

@tgross @zadunn I can confirm that having containerpilot start a supervisord process which manages a consul agent and service shows the desired behavior. From there you just point containerpilot to the agent running on the local node. Telemetry and health checks still work the same. If the service is terminated (not SIGTERM), supervisord will restart it.

tgross commented 8 years ago

Ok, good to hear that there's a workaround that works for you. Lots of folks don't want to run multi-process containers w/ a supervisor, though, so I'm still going to see if we can come up with a workaround inside the ContainerPilot.

tgross commented 8 years ago

I'm getting somewhere on this. The /v1/catalog/register API endpoint takes arbitrary values for Node; they don't have to be real Consul agents. Which means we could potentially use this endpoint and allow a user who's deploying on Triton (or similar topologies) to pass a config flag to override the Node value. So for example I was able to register a service using this body and curl -XPUT --data @service.json http://$(triton ip my_consul_1):8500/v1/catalog/register

{
  "Datacenter": "dc1",
  "Node": "triton",
  "Address": "192.168.10.10",
  "Service": {
    "ID": "redis1",
    "Service": "redis",
    "Tags": [
      "master",
      "v1"
    ],
    "Address": "127.0.0.1",
    "TaggedAddresses": {
      "wan": "127.0.0.1"
    },
    "Port": 8000
  },
  "Check": {
    "Node": "triton",
    "CheckID": "service:redis1",
    "Name": "Redis health check",
    "Notes": "Script based health check",
    "Status": "passing",
    "ServiceID": "redis1"
  }
}

tgross commented 8 years ago

I've opened this post in the Hashicorp Consul mailing list to get suggestions: https://groups.google.com/forum/#!topic/consul-tool/09OkySyoSnA

Consul is the favored backend for Joyent's ContainerPilot. End-users who have deployed Consul on Joyent's Triton platform have run into trouble as described here. To summarize, these users are deploying Docker containers into an environment where the entire data center is treated as one large Docker host. When a service is registered, we're using the Agent API, but this means the service is being registered against a particular Consul node in the cluster. This runs into two problems:

There's no particularly good way to differentiate between agents in the cluster so that containers always use a particular agent.

If a given agent stops (it crashes or its underlying host is rebooted) then all containers associated with it become unhealthy, even though they are on different hosts.

This problem will occur in any PaaS-like environment (ex. Heroku), inasmuch as containers don't have a "local" agent to talk to. This means Consul is making topology assumptions that work well for “machines,” but are in conflict with other uses. We don't have VMs, so we don't need Consul to be aware of the underlying infrastructure and impose assumptions about it, but still want to be able to use Consul for service discovery in the application.

We can partially work around this problem by registering using the Catalog API and passing in a shared identifier for the Node field that doesn't correspond to a real node.

So for example I am able to register a service using this body and curl -XPUT --data @service.json http://$(triton ip my_consul_1):8500/v1/catalog/register ... But when we do so, there's no way to create the health check (we use TTL checks) using the Catalog API.

Is there a way that I'm not seeing to have the health checks be defined at the catalog level rather than the agent level? I've read thru the architectural docs on anti-entropy and it looks to me like this a restriction of the API rather than something inherent to Consul's mode of operation (feel free to correct me!).

Otherwise, as far as I see it, we have a few options and none of them are all that great:

Users can deploy multi-process containers where every application container includes a Consul client agent, as described by @jgillis01 here

Users can deploy their application containers so that each has an assigned node within the Consul cluster (ex. assigning all instances of a particular service in a given DC to a single Consul node). For large deployments this would want to include lots of client agents so that the "blast radius" of losing a particular Consul node would be small. Another disadvantage of this is that the we lose the ability to use simple tooling like Docker Compose to bootstrap the Consul instances.

Incorporate components of Consul into the ContainerPilot project so that containers using ContainerPilot can act as client agents.

zadunn commented 8 years ago

@tgross - I am not seeing any real activity on the mailing list for this issue. Have you been able to get any ones attention on this?

misterbisson commented 8 years ago

@zadunn You're right about the mailing list thread. If you want to help bump the issue there or in Twitter, it can't hurt.

@tgross outlined some options with Consul above, but he's been exploring what it would take to do everything via Etcd while we've been waiting for feedback on the Consul questions above.

We have not had a chance to dig into the Consul code deeply enough to understand the implications of incorporating that in ContainerPilot so that it behaves like and appears to be a Consul agent. Perhaps you're in a position to look at that?

tgross commented 8 years ago

While I certainly would prefer to get Consul working because I like it better, https://github.com/autopilotpattern/etcd/pull/1 has an initial pass at making sure we have a working etcd implementation on Triton. I'll give a couple of our blueprints a try against this cluster to make sure the semantics are we expect.

Addendum getting https://github.com/joyent/containerpilot/issues/171#issuecomment-223336265 done would help us out with etcd as well.

mterron commented 8 years ago

I've been thinking about this one and have 3 alternatives to suggest:

The cleanest one I can think of is merging CP with Consul, adding to Consul the ability to launch a service and do the pre/post scripts. This way CP will disappear and become a function of Consul. I don't think this will happen though, but it is a valid solution to explore.
The compromise one. Add to CP the ability to run Consul (in agent/client mode?) as a coprocess in addition to the main application we want to run. This will solve the issue quite cleanly as there will always be a local consul agent for each container. There's an application called Bifurcate that does some of the things needed for this and could be use for extracting some of the code (it is written in golang too). I know you guys already expressed you don't want to be a full init system, but accommodating this might be an acceptable compromise.
The easiest one. Register all the services in all the Consul nodes/servers. For this we'd need to query Consul for all the Consul nodes/servers (both DNS and the API will provide this information). It is quite hacky but it will fix the issue, introducing an inaccuracy in counting the number of instances of a service you have (instead of # services you will have # services times the number of Consul nodes).

Hope this helps to start exploring a solution.

Cheers

tgross commented 8 years ago

The cleanest one I can think of is merging CP with Consul, adding to Consul the ability to launch a service and do the pre/post scripts. This way CP will disappear and become a function of Consul. I don't think this will happen though, but it is a valid solution to explore

While that would be an interesting project, it means giving up all portability. We want to be able to support etcd most definitely, and in theory ZK and Eureka as well. But that's probably academic because I can't see Hashicorp deciding to merge ContainerPilot functionality into Consul.

Add to CP the ability to run Consul (in agent/client mode?) as a coprocess in addition to the main application we want to run.

I raised that suggestion in the thread on the Consul mailing group, but without adding the complexity of incorporating this behavior into ContainerPilot. You could just have a process tree like this:

runit
|_ Consul
|_ ContainerPilot
     |_ shimmed app

If we wanted to bake any behavior into ContainerPilot itself then we'd do it as library code where the Consul backend spins up a goroutine and does the Consul RPC work there. I've looked into this a little bit and it is a lot of extra work but it's doable. And in any case, if we were to go down this path I'd recommend that we have a second Consul backend (call it consul-agent rather than consul) so that non-Triton deployments don't have to take on this extra complexity.

The easiest one. Register all the services in all the Consul nodes/servers.

It's not enough to register them, you also have to send health checks to all of them. Assuming the Consul API will actually allow us to do this (I'm not sure it will), this means we have to do a lookup of the A-Record that CNS gives us and then send to all of them. Also, how do we handle partial partitions? Will we have a race where one Consul server is saying we're unhealthy and the other is saying we're healthy (Consul is CP but we don't know whether the TTL API call is what's synced on the raft or whether the state of the instance is what's synced)?

tgross commented 8 years ago

Update on this from Hashicorp https://github.com/hashicorp/consul/issues/2089:

Having Consul servers manage TTL expiration for these checks would be a significant new feature. This is currently managed completely on the agent side and they send edge-triggered updates to the Consul servers when a TTL expires and the state changes, so Consul servers have no concept of what kinds of checks are present, and they don't know anything about check TTLs. We have some precedent for servers handling TTL expirations via sessions, and I have a design sketched out to make managing TTL expirations much more efficient, so it seems like we could work something out here to add potentially lots more TTL-expiring things to be managed by servers.

Agents currently provide a buffer between the load from refreshing TTLs (which the Consul servers never see) and service state changes (which the Consul servers see but happen much less often). Having many, many processes posting TTL refreshes directly to the servers could put a lot of extra load on them in a way that may not scale well. I don't think all the TTL refreshes should need to go through Raft, but we'd need to do some careful planning to make sure that's true so we don't create a bottleneck.

We'd want to have a solid plan for #2 before jumping into code - that's probably best worked out via a new Consul Github issue. I'm happy to help figure this out!

mterron commented 8 years ago

Tim, why do you think about TNS for this? You can do pure Consul HTTP for the query and use the .node.consul addresses for updating the TTL. I don't think the problem is Triton related at all, it will happen in any environment.

I can see how this will put a high load on the servers, so I'll keep building multi process containers, but thanks for pushing the conversation forward.

tgross commented 8 years ago

You can do pure Consul HTTP for the query and use the .node.consul addresses for updating the TTL.

Sure but then you need to track changes to the .node.consul addresses and expire them too. I guess you could do that via a backend/onChange handler. Or you could randomly pick one and cache that so that you always write to that one, but that's going to be terrible for HA.

I don't think the problem is Triton related at all, it will happen in any environment.

It'll happen in any environment where you're using a CNAME to reach Consul. But the typical deployment scenario they envision is having an agent on the local machine (i.e. in the container or on the host where your apps live) which you can reach over localhost. This way the application is guaranteed to hit the same Consul agent. But this doesn't work on Triton or any PaaS unless you pack Consul in the container.

tgross commented 8 years ago

Here's a demonstration of using runit as a lightweight supervisor for having Consul running as an agent inside the same container as ContainerPilot and Nginx: https://github.com/tgross/nginx-autopilotpattern/tree/multiprocess

tgross commented 8 years ago

Here's a demonstration of having an HA etcd deployment using ContainerPilot, for those who want to go down that path instead: https://github.com/autopilotpattern/etcd/pull/1

tgross commented 8 years ago

I've opened https://github.com/joyent/containerpilot/issues/174 so that we have some documentation on this issue.

fannarsh commented 8 years ago

Hi guys, As I see it the problem here, is that currently Containerpilot is treating the Consul Server node as a Consul Agent client. As it has been mentioned Consul prefers running one Consul Agent locally which handles the checks and only sends "real" changes upwards to the cluster, this helps to keep the load on the Consul Servers down. But as it stands now, we are registering one service and one check per Triton instance to one of the servers which goes against the Hashicorp design.

And my problem is that I would want to shut down one of my Consul Servers and to do that I need to deregister all my Triton instances that are using that server as an agent and register them elsewhere onto a different "server/agent". Basically restarting all those healthy services so that they can jump to a different agent.

A solution would be to register services via the catalog endpoint as an external service. But then we lose the health checks. On the other hand there is some talk about an consul-external daemon that would run those checks. https://github.com/hashicorp/consul/issues/259

Another solution, which @tgross already mentioned, is to make Containerpilot act as a Consul Agent node, which in my opinion would be the best solution.

Running the Consul Agent as an extra process in my container is currently what comes closest to what Consul expects, but it introduces complexity and maintenance I would rather be without.

tgross commented 8 years ago

@fannarsh thanks for the link to that thread. https://github.com/jmcarbo/consul-externalservice looks like a promising workaround and small enough in scope that maybe we could incorporate it directly into a ContainerPilot backend. Edit: still has a localhost Consul but it looks like that shouldn't be a limitation.

But as it stands now, we are registering one service and one check per Triton instance to one of the servers which goes against the Hashicorp design.

Yeah, this is an accurate summary of the problem.

A solution would be to register services via the catalog endpoint as an external service. But then we lose the health checks

I don't think we should be willing to give up health checks.

tgross commented 8 years ago

Some more digging into this if we wanted to go down the road of having ContainerPilot participate as a Consul agent: I've traced how the agent updates the servers (via serf) to here. So the agents just use the RPC equivalent of /v1/catalog/register to keep the servers up to date with changes; there's nothing special or undocumented going on.

It'd be trivial to maintain the TTL state in ContainerPilot to do this, but then we'd just have the last problem to solve of what happens when our fake Consul agent disappears because we stopped the container. Our agent also needs to participate in the LAN gossip pool.

misterbisson commented 8 years ago

@tgross is it fair to summarize your your point in https://github.com/joyent/containerpilot/issues/162#issuecomment-224348555 as follows?

The Consul catalog knows that a service is unhealthy because:

The agent (or fake agent) has actively marked it that way
The agent (or fake agent) is non-responsive in the gossip pool

You're feeling confident about the feasibility of item 1, but item 2 is more complex and requires more research to have confidence about?

tgross commented 8 years ago

That's exactly correct.

tgross commented 8 years ago

In the interest of saving folks some research, I've been thru the source of every tool in https://www.consul.io/downloads_tools.html and done a bunch of digging on my own and haven't found a single alternate implementation of the serf membership protocol that we might be able to borrow w/o lifting large chunks of Consul into ContainerPilot.

That being said, much of the low-level work is being done in https://github.com/hashicorp/memberlist and then the Consul agent package is putting some higher-level behaviors on top of that.

tgross commented 8 years ago

I've spent the last two days looking into whether we could make ContainerPilot join as a Consul agent, and while it's just barely technically feasible, it's my feeling that this would unacceptably expand the complexity of ContainerPilot for a single use case. In lieu of this, I'm proposing Co-Process Hooks. See https://github.com/joyent/containerpilot/issues/175

We've provided documentation and recommendation for workarounds on this issue. At this point if there are no objections I'd like to close this issue and provide a solution via the co-process hook work.

zadunn commented 8 years ago

Thanks @tgross. @jgillis01 is on vacation but I think this a good way forward for now. Thanks for the follow through and for an awesome product!

TritonDataCenter / containerpilot

Registering containers as separate nodes in consul #162

CONSUL AGENT