Proposal: add user defined application telemetry

TritonDataCenter / containerpilot

A service for autodiscovery and configuration of applications running in containers

Mozilla Public License 2.0

1.12k stars 136 forks source link

Proposal: add user defined application telemetry #27

Closed misterbisson closed 8 years ago

misterbisson commented 8 years ago

This is poorly thought through so far, but I wanted to open up discussion for it.

Health checking gives us a binary way to determine if the app is working or not. If the app is not healthy, we should not send any requests to it and we most likely need to spawn a replacement. But scaling depends on more than binary app health. Every app has one or more performance indicators that can be used to determine if the app is nearing overload and should be scaled up or is too lightly loaded and should be scaled down. In the spirit of what Containerbuddy does to make applications container-native, I think it might also make sense to add an awareness of those performance indicators.

Some performance indicators can be read from the system. The five minute load average reported by the kernel may work well for many apps, but many of the most interesting indicators come from the app itself.

In MySQL, the number of Query entries from SHOW PROCESSLIST that are in any Waiting state can be a very significant performance indicator, and it's one that is best retrieved from within the container.

In Nginx, the average request processing time is useful info, but that's only output in the logs, which are hopefully not inside the container (if they were, ngxtop would be a nice tool to help understand them). But we do expect http_stub_status_module in our triton-nginx image, so we can look at that instead. Active connections vs. the worker_connections limit is a hugely important number there. Any delta between accepts and handled is a huge red flag. Or, perhaps, the Waiting number is an inverse indicator (high numbers indicate low activity, low numbers are high activity, zero could be critical). (See https://github.com/tgross/triton-nginx/issues/3 for further musing on this.)

Here's what I'm imagining it would look like in the config (though it would be obviously unlikely to actually mix Nginx and MySQL in a single image):

  "kpis": [
    {
      "name": "system-loadavg",
      "poll": 103,
      "kpi": "/opt/containerbuddy/system-loadavg.sh"
    },
    {
      "name": "mysql-queries-waiting",
      "poll": 31,
      "kpi": "/my/bin/mysql-queries-waiting.sh"
    },
    {
      "name": "nginx-busy",
      "poll": 17,
      "kpi": "/my/bin/nginx-busy.sh"
    },
    {
      "name": "nginx-missed-connections",
      "poll": 137,
      "kpi": "/my/bin/nginx-missed-connections.sh"
    }
  ]

In order, the above four KPI entries would return:

The 5 minute load average (the trailing average, returned every 103 seconds). This is decimal value. It should generally not be higher than the count of VCPUs the instance has access to, though that can be hard to figure out in a Docker environment.
The count of MySQL Query entries from SHOW PROCESSLIST that are in any Waiting state. 0 is great. 1 or above can be trouble. 10 or more is probably critical.
The result of Nginx's (${Active_connections} - ${Waiting}) / ${worker_connections}, a decimal value. 1 is maxed out, 0.5 is 50% busy.
The result of Nginx's accepts - handled vars, an integer value. Anything other than 0 here is probably critical.

The executables that get these KPIs are to be provided by the app developer/packager, though it might make sense to have a common way to get thee system load average as part of Containerbuddy.

As with the health checks, the executables would be run periodically at the time specified. The health checks could then be returned in stdout as newline delimited JSON as they're executed, for interpretation and use by whatever tools are reading the logs.

I wanted to offer those specific use cases as as an exercise to figure out what data would be handled here. In short, I'm thinking each KPI is a numeric (decimal allowed) time series value. The executable will simply return that value and Containerbuddy can pass it on with the name given in the config.

tgross commented 8 years ago

Streaming the metrics data out on the logs output is not as useful as something like a time-series database output. For a busy metric, parsing the text/JSON of a metric out from the application logs may be expensive in terms of throughput (computation, take a look at the logstash grok implementation in https://github.com/tgross/triton-elk/tree/working for example) and storage. Additionally, the Docker logs are emitted as line-formatted, so stuffing JSON into there is going to add extra mess for log drivers like syslog or gelf.

My proposed alternative is to have a metrics backend feature to ingest metrics from the user-defined metrics checks (via attaching to stdout of the metrics check process) and then emit them into the format expected by upstream metrics collection/aggregation services. A given metric can have multiple backends -- e.g. maybe we want to permanently log the metrics in application logs but have a rolling real-time of the last week in graphite.

Each backend will need to document an API contract with the metrics check so that user knows what format to put the data onto stdout so that it can be read correctly. Because each backend needs its own set of arguments to function, the configuration format might look something like:

"metrics": [
  {
    "name": "my.metric.name",
    "poll": 10,
    "check": "/path/to/my/metrics-check",
    "backends": [
        {
          "statsd": {
              "argKey1": "argVal1",
              "argKey2": "argVal2"
            }
        }
    ]
  },
  {
    "name": "my.other.metric.name",
    "poll": 5,
    "check": "/path/to/my/metrics-check",
    "backends": [ {"log": {}} ]
  }
]

Possible backends:

statsd
collectd
logs: format the metric in a single line and write to stdout so the log driver can take it from there
InfluxDB "line format" or HTTP protocol
OpenTSDB binary or HTTP protocol
NewRelic, possibly via their new Agent SDK

I'd propose that the initial implementation I work up will include the logs backend (easy fallback) and probably statsd because it's easy and I've implemented it before.

(Also, I'm not wild about using the term KPIs, which is semantically overburdened with business KPIs and specifically "performance" rather than arbitrary metrics. I've suggested "metrics" in its stead.)

tgross commented 8 years ago

Noting this will be a post-1.0 release item.

justenwalker commented 8 years ago

@tgross I like the idea of pushing stats to one or more backends.

As part of the backend configuration, we'd have to be able to hook into the service discovery. For example: I may want to run statsd with containerbuddy and have my other services seamlessly find it.

Since the containerbuddy config is pre-processed with go template, we can't use go templates in the backend configs besides ones that would come from environment variables. Perhaps we can have a very simple interpolation syntax for service discovery: ${SERVICE.TAG[VALUE]}. So the ${statsd.prod[address]} would match the statsd service with the prod tag and select the address; ${statsd.prod[port]} would do the same but for the port. Could omit the tag and just do ${statsd[address]}

It may also be useful to separate backends from their sensors and just reference them by name.

Here's an example of the backend config format I've been thinking about:

{
    "metrics": {
        "sensors": [
            {
                "name": "my.metric.name",
                "poll": 10,
                "check": "/path/to/my/metrics-check",
                "backends": [
                    "statsd",
                    "log"
                ]
            }
        ],
        "backends": {
            "statsd": {
                "driver": "statsd",
                "statsd": {
                    "host": "${statsd.prod[address]}",
                    "port": "${statsd.prod[port]}",
                    "prefix": "{{ .STATSD_PREFIX }}"
                }
            },
            "statsd-env": {
                "driver": "statsd",
                "statsd": {
                    "host": "{{ .STATSD_HOST }}",
                    "port": 8125,
                    "prefix": "{{ .STATSD_PREFIX}}"
                }
            },
            "log": {
                "driver": "log",
                "log": {
                    "format": "<someformatlanguage>",
                    "file": "stdout"
                }
            }
        }
    }
}

Could remove the driver part of the backend config - since it's more for ease of parsing. Without that, we have to iterate through all drivers and see if the key is contained in the config to see which driver they want, so it makes the logic a bit more complicated.

For the log driver format, maybe look at go-logging. Possibly unnecessary if we just take the raw output of the script - so the script can control its output format. file could be stdout, stderr, or a path to a file on the container - to allow logging to a volume.

justenwalker commented 8 years ago

Re-thinking the syntax of the service discovery : it has a fatal flaw in that you may have several services but can only interpolate one of them. Not sure how to get around this other than making the host "dynamic" instead of supplying a host and port.

{
    "service": {
        "name": "statsd",
        "tag": "prod",
        "select": "round-robin|random|first"
    },
    "prefix": "{{ .STATSD_PREFIX }}"
}

misterbisson commented 8 years ago

@justenwalker I've been spending a lot of time thinking about this problem, including the points argued in http://container-solutions.com/monitoring-performance-microservice-architectures/, lately. I also take @tgross's concerns about added scope and complexity related to this, especially when it comes to pushing metrics, given the number of different drivers that would need to be supported, the need to deal with backpressure, and other details.

For those reasons and more, I've been looking at pull models for this, rather than push. http://prometheus.io (originally developed by SoundCloud, but open source and used by lots of folks) demonstrates a strong working use case for the pull model, and they've got a Go library that looks relatively well maintained for it, https://github.com/prometheus/client_golang.

The great thing about the pull model is that it could keep Containerbuddy simple, while still supporting push. We'd have to develop a separate metrics forwarder container that pulls from a number of Containerbuddy-enabled containers and then pushes to the consumer of the user's choice, but log forwarders are common, so there's lots of precedent for a similar forwarder for application metrics. Importantly, that allows us to support an unlimited range of consumers/protocols without adding complexity to Containerbuddy itself.

What I also like about the pull model is that it allows us to express information about the application for use by consumers not yet developed. We could, for example, express the Containerbuddy config on that interface, allowing other applications to semantically map the relationship between containers and include status information (health, performance, etc.) about them. Doing that with a push model would be very difficult. I like the idea of a Containerbuddy visualizer with that kind of info.

As for how this works in practice, I think it's inline with the much simpler config syntax I proposed @earlier, something like:

  "metrics": [
    {
      "name": "system-loadavg",
      "poll": 103,
      "check": "/opt/containerbuddy/system-loadavg.sh"
    },
    {
      "name": "mysql-queries-waiting",
      "poll": 31,
      "check": "/my/bin/mysql-queries-waiting.sh"
    },
    {
      "name": "nginx-busy",
      "poll": 17,
      "check": "/my/bin/nginx-busy.sh"
    },
    {
      "name": "nginx-missed-connections",
      "poll": 137,
      "check": "/my/bin/nginx-missed-connections.sh"
    }
  ]

The STDOUT output of check is the value of that named metric. Containerbuddy would save the last result to return when its Prometheus-compatible endpoint is hit.

tgross commented 8 years ago

I'm going to break some of this down piecemeal and re-order/re-combine some thoughts because there's a lot of text here. I think we agree on the overall idea of collecting metrics but the push vs pull seems to be a point of disagreement:

I also take @tgross's concerns about added scope and complexity related to this ... Importantly, that allows us to support an unlimited range of consumers/protocols without adding complexity to Containerbuddy itself. ... What I also like about the pull model is that it allows us to express information about the application for use by consumers not yet developed.

This just pushes the complexity from Containerbuddy to an as-yet-not-existing application. That application would have all the complexity that you're talking about here and would have to express information in whatever protocol each of the supported providers requires. There's a "Law of Conservation of Complexity" at work here and pushing vs pulling can't really solve it.

http://prometheus.io (originally developed by SoundCloud, but open source and used by lots of folks) demonstrates a strong working use case for the pull model

It's as far as I know the only such case, and it requires that the target application supports its protocol and exposes a port to it.

log forwarders are common, so there's lots of precedent for a similar forwarder for application metrics.

These are also single points of failure in the infrastructure.

the need to deal with backpressure, and other details.

We'd have to deal with backpressure in the collector instead.

Given that we'd have to develop all this stuff anyways, what do we gain by separating it out from Containerbuddy, where we're going to be collecting the metrics anyways? We have a precedent in the architecture for a "plugin style" development process where we can add individual backends (discovery backend currently) incrementally as demanded by users or provided by the community. Why wouldn't this work for statsd, collectd, influxdb, syslog, etc. etc.?

tgross commented 8 years ago

Also, FWIW, I'm not sure the conclusion of pull vs push in http://container-solutions.com/monitoring-performance-microservice-architectures/ actually follows from what he's saying. He's conflating the statefulness of the monitoring analysis with the location of the sensors.

His specific analogy of power systems is bizarre -- those systems create those aggregated views he's using to say "look it's all one view of the combined state" from literally thousands and thousands of individual local sensors.

justenwalker commented 8 years ago

I think we agree on the overall idea of collecting metrics but the push vs pull seems to be a point of disagreement:

The reason I prefer push semantics: In a dynamic system, I don't want something that must enumerate and react to all systems in the environment to do its job - that work seems better off delegated to the individual applications.

This just pushes the complexity from Containerbuddy to an as-yet-not-existing application.

There is something attractive about extracting the metrics gathering package though - very single responsibility / unix-y.

misterbisson commented 8 years ago

There's a "Law of Conservation of Complexity"

Right, and moving the complexity out of a large number of applications containers and into a dedicated container better fits the goals and ideals of Dockerization. Keeping Containerbuddy's memory footprint low is important because it's running in so many containers.

It's as far as I know the only such case

Actually, it's how Mesos' metrics works too.

These are also single points of failure

This is interesting. The destination in a push system is often a single point of failure. Worse, it puts a huge burden on the sender to ensure that the data has been sent. A pull system, instead, makes it easy to have multiple systems reading the same data, ensuring there is never a single point of failure.

We'd have to deal with backpressure in the collector instead

Right, but that complexity, and it's not small or lightweight, is better moved out of the application container. Our discovery when implementing Docker log drivers, for instance, is that they are failure prone. Crashing an application because Containerbuddy's metrics driver crashed seems an unnecessary risk. Some interesting reading on error handling in a well-behaved push driver can be found in node-statblast.

I don't want something that must enumerate and react to all systems

Except that a metrics forwarder can do that very easily by monitoring Consul. More importantly, the pull model allows us to express the semantic relationship between containers and use that data in ways not possible with a push model.

tgross commented 8 years ago

@justenwalker wrote:

There is something attractive about extracting the metrics gathering package though - very single responsibility / unix-y.

We've taken the position all along in this project that the "unit of responsibility" was the container itself. Pulling the code into a separate package that gets compiled-in to the binary we release as Containerbuddy is just good development practice. But it doesn't naturally follow from there that we deploy it as a separate service. Having a separate service means coupling the deployment of every service to the deployment of the forwarder. This is the same argument we made against coupling topology to the scheduler.

Right, but that complexity, and it's not small or lightweight, is better moved out of the application container. Our discovery when implementing Docker log drivers, for instance, is that they are failure prone. Crashing an application because Containerbuddy's metrics driver crashed seems an unnecessary risk.

One could easily make the same argument about health checks and service discovery, but we took the position with Containerbuddy that it was the responsibility of the application to manage this. There are levels of failure and having the metrics driver fail but designed to fail safe mitigates that risk.

Except that a metrics forwarder can do that very easily by monitoring Consul.

To do so means updating the Containerbuddy configuration for the metrics forwarder for each new service we add. Whereas a push model means that we just treat the metrics collection point as a backend and the deployment of the application and collector are de-coupled.

More importantly, the pull model allows us to express the semantic relationship between containers and use that data in ways not possible with a push model.

As I pointed out above, that's entirely a feature of whatever's being used for analysis. Push vs pull doesn't come into the ability of the system to make stateful analysis. This is a red herring.

misterbisson commented 8 years ago

Having a separate service means coupling the deployment of every service to the deployment of the forwarder. This is the same argument we made against coupling topology to the scheduler

I don't follow. What forces the coupling? The user is now freed to use whatever they want in a way that's not possible with a push model.

The pull model demarcates the interface in the most minimal way possible, leaving the user to implement whatever metrics collection she chooses, including none.

we took the position with Containerbuddy that it was the responsibility of the application to manage this

We didn't take the position that Containerbuddy would do it in the most complex and resource-heavy way possible, though. The pull model offers an alternative that minimizes the complexity that is repeated throughout each container.

We don't run every part of an application in the same container. There are interfaces between those containers. Choosing an interface that minimizes the complexity and resource consumption that's resident in the container with the primary application is the responsible thing to do.

To do so means updating the Containerbuddy configuration for the metrics forwarder for each new service we add

I don't follow this? Why would the Containerbuddy config file need to know about any of pull consumers at all, including a metrics forwarder? The simplicity of the pull model is that Containerbuddy doesn't have to know about what's consuming the data, and that it can be consumed by a number of different clients that behave differently and use the data for different purposes.

a push model means that we just treat the metrics collection point as a backend and the deployment of the application and collector are de-coupled

This argument seems very specious. The coupling is much stronger in a push model, since that's the case where the Containerbuddy configuration does need to change for each push target.

Push vs pull doesn't come into the ability of the system to make stateful analysis

Push does significantly limit what we can do with the data, however. A push implementation that allows pushing to multiple consumers multiplies the complexity in both design and operation, making it likely that we'll have a single consumer and users will need to chose what consumer to send the data to. A pull model allows an unlimited number of consumers of the data, opening the door for far more experimentation, and running multiple consumers with varied purposes. Importantly, the pull model yields to casual exploration and informal checking in a way that the push model never can.

tgross commented 8 years ago

We didn't take the position that Containerbuddy would do it in the most complex and resource-heavy way possible, though. The pull model offers an alternative that minimizes the complexity that is repeated throughout each container.

Except that you're proposing we run an HTTP server in every container, even the ones that weren't previously running HTTP at all (ex. MySQL). What serialization formats do we support? What authentication models do we support? How do we distribute server certificates to every container in our infrastructure to support TLS?

I don't follow this? Why would the Containerbuddy config file need to know about any of pull consumers at all, including a metrics forwarder?

Sorry, I wasn't clear there: I mean the Containerbuddy running in the metrics forwarder. In order to pull all the metrics from the containers, it needs to be able to get updates from the discovery service, effectively treating all the services as backends (in the Containerbuddy sense). Which means that if we add a service, we need to update the configuration of the metrics forwarder so that it knows that it has to collect a new set of metrics and watch for changes to the nodes in that service.

Note that each service will need to advertise what port its metrics endpoint is listening on to the discovery service.

since that's the case where the Containerbuddy configuration does need to change for each push target.

Sure, which we currently do for all a container's other resource dependencies.

A push implementation that allows pushing to multiple consumers multiplies the complexity in both design and operation, making it likely that we'll have a single consumer and users will need to chose what consumer to send the data to.

Ok, that's a valid argument.

Importantly, the pull model yields to casual exploration and informal checking in a way that the push model never can.

How?

justenwalker commented 8 years ago

@tgross wrote:

Except that you're proposing we run an HTTP server in every container, even the ones that weren't previously running HTTP at all (ex. MySQL). What serialization formats do we support? What authentication models do we support? How do we distribute server certificates to every container in our infrastructure to support TLS?

I think this point is worth considering seriously. Creating an HTTP/S Server for every container seems pretty bad not only from a maintenance perspective, but also for security. It exposes a large attack vector for exploiting vulnerabilities in containerbuddy - since every container on the system will have this endpoint open.

We can consider doing without the TLS support, since this endpoint should not be exposing sensitive information. Adding overhead to a high-volume endpoint could be a problem. Every container doing TLS handshakes might be resource intensive, but that means someone could trivially sniff and possibly alter metrics if they have a privileged network position.

tgross commented 8 years ago

Ok, so @misterbisson and I had a long sidebar discussion about this and I think I've come around to his way of thinking. I should point out that this is clearly not a case where there's an "obviously right" answer and so we're going to have to make an informed engineering trade-off no matter which way we go.

My argument that pulling metrics reverses the position that we've been taking with Containerbuddy with respect to where the orchestration lives is probably the weakest argument. The Containerbuddy model isn't about push vs pull but where the intelligence about specific applications lives. Pull vs push doesn't come into this because we're doing the measuring inside the container either way.

From the perspective of complexity, there's a good argument for "knowing where to stop". We're going to need to draw a line somewhere on what this application owns, otherwise we risk throwing in the kitchen sink which is what IMO has been plaguing Kubernetes.

There's also a missing piece to this discussion which @misterbisson alluded to previously, which is that there is a second layer of metrics collection in most environments -- the underlying infrastructure. Among the major public cloud providers (AWS, GCE, and of course Triton!) and in Mesos or Kubernetes deployments, getting the performance metrics of the underlying system is an HTTP pull. This means that in many deployment scenarios collecting metrics is already going to involve a collecting agent that's pulling.

If we do go with pull-based metrics, an important point is that this is additive to the insights we get from the container. Existing applications that use push-based metrics because of a statsd client or the like running off the main thread (I've written this sort of thing before on other projects) still work without any change except that they can find their backend more easily with Containerbuddy. So the metrics endpoint can be entirely opt-in by configuration.

The more I think about it I also really like @misterbisson's argument around being able to have multiple or even informal consumers of metrics which is obviously going to be easier with a pull-based model.

What I'd specifically propose is that we provides a metrics endpoint that implements a Prometheus-compatible protocol, for which we can leverage the Prometheus client library (confusingly, this repo is both the API client and the "server"). This gives us a target protocol to implement which will be workable with common infrastructure setups as described above.

There are some trade-offs we're going to make here.

We're saying "here's the protocol we're supporting" which means accepting that we may need to develop adaptors as separate independent projects.

The question of risk around running a HTTP(S) server is real, but it we're also narrowing the scope of what we're working on. By choosing Prometheus in particular at least we have a reference implementation to start from and won't be building many many clients. I've also been considering whether we want to give "advanced" users the option for conditional compilation of Containerbuddy. For example, add flags to our Makefile that only compiles the support for the particular discovery service they want. If someone was particularly paranoid about the server they could either just not configure it to run (it would be off by default) or conditionally compile their own Containerbuddy so that it doesn't even exist.

Thoughts?

justenwalker commented 8 years ago

The more I think about it I also really like @misterbisson's argument around being able to have multiple or even informal consumers of metrics which is obviously going to be easier with a pull-based model.

Fair point - and if that's the way that makes sense for most people I don't have any strong disagreement. Consul makes it easy to connect to every node, and its easier to throttle the collector rather than the pushers.

In my world, I like to aggregate information in a single place, so my multiple consumers would be pulling from the aggregate rather than the sensors themselves. This allows me to pull data in various states of structure, transport / serialization format and publish them using a standard format after making them conform to a single standard, perhaps decorating them with some additional context or metadata. At some point, various consumers will pull - I just like that pull to happen from one place instead of many. however this system requires more infrastructure, so it's not for everyone.

The question of risk around running a HTTP(S) server is real, but it we're also narrowing the scope of what we're working on.

Purely as an example (I'm no cryptanalyst) - perhaps you could perform a timing attack against a container running an embedded HTTP server in addition to the actual application by measuring the difference in response times from the metrics endpoint during key operations. The chance of this being exploited might be negligible, but i've seen theoretical attacks become practical in pretty short timeframes.

I've also been considering whether we want to give "advanced" users the option for conditional compilation of Containerbuddy. For example, add flags to our Makefile that only compiles the support for the particular discovery service they want. If someone was particularly paranoid about the server they could either just not configure it to run (it would be off by default) or conditionally compile their own Containerbuddy so that it doesn't even exist.

I like giving people that choice. But I'd prefer a runtime flag. Or perhaps both? Having options for reducing the attack surface is always welcome in my book.

tgross commented 8 years ago

perhaps you could perform a timing attack against a container running an embedded HTTP server in addition to the actual application by measuring the difference in response times from the metrics endpoint during key operations

You mean against the main application? (I'm taking as granted that the golang TLS implementation is correct here for sake of sanity.) Performing a timing attack by cross-process side-effects is really only as likely as cross-container or cross-VM side effects (ex. has been demonstrated in limited circumstances on Xen but isn't a real-world attack as far as anyone can tell). That concern seems out-of-scope for us here to me?

I like giving people that choice. But I'd prefer a runtime flag. Or perhaps both?

I was definitely thinking that if there's no metrics section in the Containerbuddy config then it just would default to being off.

justenwalker commented 8 years ago

You mean against the main application? (I'm taking as granted that the golang TLS implementation is correct here for sake of sanity.) Performing a timing attack by cross-process side-effects is really only as likely as cross-container or cross-VM side effects (ex. has been demonstrated in limited circumstances on Xen but isn't a real-world attack as far as anyone can tell). That concern seems out-of-scope for us here to me?

I was actually referring to just the HTTP endpoint (no TLS) being used as a probe for possible timing attacks. Really I meant it more as an example of possible unintended consequences - not as a particular fear of mine.

I was definitely thinking that if there's no metrics section in the Containerbuddy config then it just would default to being off.

Seems fair to me.

dekobon commented 8 years ago

After catching up on this discussion, I'm swayed by @misterbisson's arguments for a pull model but I'm very scared of running additional listening ports as a requirement of metrics gathering. In fact, this would be a deal breaker for a number of enterprises. Is this really a binary choice of push vs. pull? What about some of the other models?

We could push to the metadata interface within a zone (this is Triton specific).
We could push via a socket available within the container (this is Linux Docker specific).
We could use an event driven architecture to broadcast metric events. Consumers would then register to listen for those events. You could even have a server that acts as a web API allowing services to interface with metrics via a pull interface.
We could aggregate metrics and periodically publish them to Consul's key/value store.

To be honest, I very much like the flexibility of event driven models. This allows for lossy consumption of data. In some implementations, you won't need to worry about backing off when transmitting because the event buses will just drop what they can't handle.

justenwalker commented 8 years ago

Nothing says we need to build the metrics backends into Containerbuddy. One way to have both options.

Pull - Have an optional metrics section for pull metrics over Prometheus
Push - Have an optional section for periodic commands:

{
  "onScheduled": [
    { "frequency": "1s", "command": [ "/bin/push_metrics.sh" ] },
    { "frequency": "10s", "command": [ "/bin/push_other_metrics.sh" ] }
  ]
}

tgross commented 8 years ago

@dekobon wrote:

I'm very scared of running additional listening ports as a requirement of metrics gathering. In fact, this would be a deal breaker for a number of enterprises.

But building an event bus and server for this isn't? I feel like organizations that are packing that level of sophistication already have options for gathering metrics from their applications. Please keep in mind that Containerbuddy is intended as a shim for the cases where this sort of development expertise isn't already available.

We could push to the metadata interface within a zone (this is Triton specific).

I don't want to support a Triton-only option.

We could push via a socket available within the container (this is Linux Docker specific).

I don't want to support a Docker-on-Linux-only option.

We could use an event driven architecture to broadcast metric events. Consumers would then register to listen for those events. You could even have a server that acts as a web API allowing services to interface with metrics via a pull interface.

Which is a whole other pile of infrastructure for the user and a new protocol for us to develop which none of the existing metrics sources will implement out of the box, unless we're building a metrics proxy. In which case we're back where we started with

We could aggregate metrics and periodically publish them to Consul's key/value store.

Which means consumers wouldn't get access to the raw data. (Plus I don't trust Consul to support that level of writes.)

@justenwalker wrote:

Have an optional section for periodic commands:

Which you can do already as part of your health checks. Which, interestingly enough, was my original suggestion to @misterbisson on this subject. :grinning:

justenwalker commented 8 years ago

Plus I don't trust Consul to support that level of writes.

:+1: From experience, this does not work at scale.

Which you can do already as part of your health checks. Which, interestingly enough, was my original suggestion to @misterbisson on this subject.

Overloading the concept of Health Check to push metrics seems like a hack. I'd prefer calling it out explicitly so that the implementations can vary independently and we don't both concerns meddling in the same configuration.

Update: I suppose you mean that I can put anything I want in my health_check.sh. The more work I do in a health check script, the more time it takes to return - which could potentially pass the TTL. I could make TTL larger in this case; but see now metrics are affecting the health-checks.

tgross commented 8 years ago

Update: I suppose you mean that I can put anything I want in my health_check.sh. The more work I do in a health check script, the more time it takes to return - which could potentially pass the TTL. I could make TTL larger in this case; but see now metrics are affecting the health-checks.

That's a very good point. I'm somewhat wary of getting "systemd syndrome" here, but I'm going to have to be honest and point out that we overloaded the health check to serve as a cron-like behavior in triton-mysql anyways, so clearly this is something that we're seeing a need for.

Given that, I think I'd support @justenwalker's proposal as described in https://github.com/joyent/containerbuddy/issues/27#issuecomment-186488865. If @misterbisson and @dekobon have no objections to that, I'd like to continue the discussion of the spec for the metrics in this thread and then maybe pull out "periodic events" or "on schedule" as its own enhancement?

justenwalker commented 8 years ago

@tgross I've pulled periodic tasks into its own issue (#87) so we can discuss further there.

misterbisson commented 8 years ago

Stupid Friday notion: this should be called "telemetry," which might be more inline with "autopilot pattern."

tgross commented 8 years ago

When we do this, let's try and split out the functionality into its own library that main calls (per https://github.com/joyent/containerbuddy/issues/83). This will reduce the scope of refactoring when we do #83 and give us some guidance on how to do it.

tgross commented 8 years ago

I've spent some time digging into the Prometheus API and what we'll need to do. Prometheus provides a Golang "client" which is the library we need to feed metrics into as well as the HTTP endpoint. So the HTTP side of things is just a matter of providing a port and metrics URL. This should be the easy part.

What will be trickier is gathering the metrics. We want to use a user-defined process to gather metrics, similar to the health or onChange handlers. (I'll refer to these hereafter as "sensors".) Under that model, the sensors fire once and return, which means they can't be long-running processes that we communicate with themselves.

We need a way to get the output of the sensors, which probably means reading from stdout of the process. If that's the case, we would need to extend executeAndWait to allow for reading stdout rather than sending it to the Docker Engine. A couple of caveats:

The sensor process needs to have a "clean" stdout for us to parse and only write unrelated info to stderr, which is a fussy requirement that I think end users are likely to screw up a lot. We'll have to just document it clearly and give good examples.
Under this model a sensor can read one and only one metric. So if you want 3 stats from the Nginx stub status module, you need to query it 3 times. Alternately, we could define some kind of line-oriented protocol but I think we're getting into complex territory for the implementor of the sensor at that point.

One of the things I was worried about was state with respect to timestamps, but it looks like this isn't really a problem. Whenever we ask for a metric from a sensor, we just take the current state of that number and pass it into the Prometheus client API for the appropriate metrics type, and it'll do the right thing with respect to maintaining in-memory state.

(As an aside, this points out an interesting problem with timestamp resolution when using Prometheus. Because the timestamps are marked at the time of collection, there will be a delay equal to the poll interval on the timestamps for metrics + however long it takes for Prometheus to make the HTTP GET. In a push-based system the timestamps are set at the time of measurement and not collection. This limits the precision of the metrics timestamps. Something we should be aware of.)

Metric types

Prometheus supports the types described below. Examples provided of the scraping output are taken from running curl tests against the "random" example in the client repo.

Counter

A counter is a numeric value that only ever goes up. Counters look like this when scraped by Prometheus:

# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",handler="prometheus",method="get"} 2

When our metrics handler fires for a counter, we'll do the following:

parse each line of the sensor's stdout as an int
Inc the counter value a number of times equal to that number

Gauge

A gauge is a number (float) that can go up or down. Gauges look like this when scraped by Prometheus:

# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.102352e+06

When our metrics handler fires for a gauge, we'll do the following:

parse the value from the sensor's stdout as a float
Set the gauge value to that number

Histogram and Summary

A histogram is a sample of observations that are bucketed and summed. A summary is similar but includes aggregation over a sliding window. Histograms look like this when scraped by Prometheus:

# HELP rpc_durations_histogram_microseconds RPC latency distributions.
# TYPE rpc_durations_histogram_microseconds histogram
rpc_durations_histogram_microseconds_bucket{le="-990"} 0
rpc_durations_histogram_microseconds_bucket{le="-890"} 0
rpc_durations_histogram_microseconds_bucket{le="-790"} 0
rpc_durations_histogram_microseconds_bucket{le="-690"} 1
... yadda yadda yadda...
rpc_durations_histogram_microseconds_bucket{le="610"} 363
rpc_durations_histogram_microseconds_bucket{le="710"} 363
rpc_durations_histogram_microseconds_bucket{le="810"} 363
rpc_durations_histogram_microseconds_bucket{le="910"} 363
rpc_durations_histogram_microseconds_bucket{le="+Inf"} 363
rpc_durations_histogram_microseconds_sum 5656.654209225414
rpc_durations_histogram_microseconds_count 363

A summary looks like this when scraped by Prometheus:

# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} 805.139
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} 805.139
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} 805.139
http_request_duration_microseconds_sum{handler="prometheus"} 1902.1860000000001
http_request_duration_microseconds_count{handler="prometheus"} 2

When our metrics handler fires for a histogram or summary, we'll do the following:

parse the value from the sensor's stdout as a float
Observe that value

Configuration

The spec above makes for fairly straightfoward configuration syntax.

{
  "metrics": {
    "port": 8080,
    "url": "/metrics",
    "sensors": [
      {
        "name": "my.counter.metric",
        "type": "counter",
        "poll": 1,
        "check": "/path/to/my/counter/check.sh",
      },
      {
        "name": "my.gauge.metric",
        "type": "gauge",
        "poll": 5,
        "check": "/path/to/my/gauge/check.sh",
      },
      {
        "name": "my.histogram.metric",
        "type": "histogram",
        "poll": 10,
        "check": "/path/to/my/histogram/check.sh",
      },
      {
        "name": "my.summary.metric",
        "type": "summary",
        "poll": 10,
        "check": "/path/to/my/summary/check.sh",
      }
    ]
  }
}

I'd like to leave open here the possibility of having a new field for a sensor that would allow for a small number of predefined sensors to be built directly into Containerbuddy.

misterbisson commented 8 years ago

https://github.com/joyent/containerbuddy/issues/111 reminded me about discovery questions I've had for this. I'm assuming that, if telemetry is enabled, Containerbuddy would automatically register itself in Consul (or whatever discovery catalog is enabled). The assumption seems safe, but we'd still need to make a number of decisions about how Containerbuddy registers itself.

If we were to announce the telemetry service via Containerbuddy's own services configuration stanza, we'd have to answer the following questions:

  "services": [
    {
      "name": "containerbuddy-telemetry",
      "port": <user defined>,
      "health": <not a user-configurable value>,
      "interfaces": <user defined>,
      "poll": <not a user-configurable value>,
      "ttl": <not a user-configurable value>,
      "tags": <maybe user defined>
    }
  ],

I'm pretty sure that the advertised interfaces and port are user-configurable, but I'm also feeling strongly that the advertised service name is not user-configurable (though I'm not insisting on the specific name I suggested above). Health checks should be handled internally, probably with relatively long polling and TTL values to minimize traffic to Consul for this service. I hadn't given any though to tags up to now, however.

That said, this all deserves a lot more thought than I've given it.

misterbisson commented 8 years ago

re: @tgross from https://github.com/joyent/containerbuddy/issues/27#issuecomment-197477741:

they can't be long-running processes that communicate with themselves

Well, they can't be long-running processes, but they can communicate with themselves via data persisted to the filesystem. One short running process can write out a SQLite DB and another can read from it.

if you want 3 stats from the Nginx stub status module, you need to query it 3 times

Well, you could query it for all three values, write the data to a file or even SQLite, and then read from that. If we were to explicitly support that, we'd want to provide some guarantee about the order of execution of the probes. Example: all probes sharing the same frequency are executed in the order they appear in the config file.

probably means reading from stdout of the process. [...] The sensor process needs to have a "clean" stdout for us to parse and only write unrelated info to stderr...

I'd assumed stdout when I started the thread here. For a moment when reading your point, however, I was going to suggest we expect JSON-formatted data. I quickly decided that would be more complex for users than doc-ing the requirement for clean output.

Metric types [...] Prometheus supports the types described below

There are four types listed, but it seems there are really only two as far as Containerbuddy would care about: stateful and stateless. If we were to choose to force the user-defined check method to handle state (which might be better handled in the application we're monitoring anyway), how else might the approach to those four metric types change?

tgross commented 8 years ago

I'm assuming that, if telemetry is enabled, Containerbuddy would automatically register itself in Consul (or whatever discovery catalog is enabled).

Very good point.

The assumption seems safe, but we'd still need to make a number of decisions about how Containerbuddy registers itself.

I think we just need to add four missing fields to the metrics stanza: service name, interfaces, tags, and TTL. Given that the metrics measurement is already polling, it's my opinion that this serves sufficiently as a health check and that if metrics polling fails we know the instance is unhealthy. Adding a second health check on top of the metrics sensors is likely to be confusing.

Well, they can't be long-running processes, but they can communicate with themselves via data persisted to the filesystem. One short running process can write out a SQLite DB and another can read from it. Well, you could query it for all three values, write the data to a file or even SQLite, and then read from that.

But we can't guarantee that this is the case with user-defined sensors without pushing all the complexity of managing state onto the end-user and greatly expanding the scope of the interface. Currently Containerbuddy only concerns itself with exit codes and that extremely narrow interface is what allows it to work with such arbitrary applications.

If we were to explicitly support that, we'd want to provide some guarantee about the order of execution of the probes. Example: all probes sharing the same frequency are executed in the order they appear in the config file.

We've intentionally not done that with anything else because it lets us execute all our external processes behind concurrent goroutines. This has performance implications in terms of timing the external process.

I'd assumed stdout when I started the thread here. For a moment when reading your point, however, I was going to suggest we expect JSON-formatted data. I quickly decided that would be more complex for users than doc-ing the requirement for clean output.

Yeah, consider the simplest case: time curl http://localhost | grep blah. Saying "ok now stuff that into JSON" is a pain and often will require more tooling.

There are four types listed, but it seems there are really only two as far as Containerbuddy would care about: stateful and stateless. If we were to choose to force the user-defined check method to handle state (which might be better handled in the application we're monitoring anyway), how else might the approach to those four metric types change?

You're losing something important in the effort of trying to simplify this. There are metrics that are statelessly measured, but the Prometheus protocol requires that we maintain state even for these "statelessly measured" metrics like gauges and counters because we're storing metrics for scraping.

Also, the Prometheus client API already handles tracking this state. Why would we throw away half the available metrics options just to load this complexity onto the end-user? With the proposed design stateless sensors "just work" and sensors of stateful data mostly just work unless they require a lot of pre-processing (something other than bucketing provided by the library), in which case the complexity is added to the user only when its needed rather than to all cases.

misterbisson commented 8 years ago

You're losing something important in the effort of trying to simplify this.

You've convinced me.

tgross commented 8 years ago

I'm going to start working on this with a goal of something to review by the middle of next week.

misterbisson commented 8 years ago

I've been having a number of offline conversations with people about this, and the information that should be provided by the container about its performance, but also its relationship to the larger application.

In addition to the application performance metrics that can be used to support scaling decisions, this interface can provide a number of details that are known to Containerbuddy:

The service(s) provided by the container, what ports they're on
Their current health
The names of upstream services the container depends on

Using that information, a consumer application can traverse the all the containers in an application, graph their relationship, and display a visual representation of the entire application, its health, and performance.

Why the name "telemetry"?

Its history of usage in aeronautics is cool
It's just different enough from "metrics" to give room for the additional application metadata that I think will be very useful
In its aeronautical use, only specifically instrumented metrics are sent via telemetry, which is true for Containerbuddy as well

tgross commented 8 years ago

Done and included in the 2.0 release: https://github.com/joyent/containerpilot/releases/tag/2.0.0

misterbisson commented 8 years ago

The additional container metadata to expose in the telemetry interface from https://github.com/joyent/containerpilot/issues/27#issuecomment-207462572 is moved to https://github.com/joyent/containerpilot/issues/154