autopilotpattern / mysql

Implementation of the autopilot pattern for MySQL
Mozilla Public License 2.0
171 stars 68 forks source link

Implement ContainerPilot telemetry #16

Open misterbisson opened 8 years ago

misterbisson commented 8 years ago

ContainerPilot 2.0 introduced a telemetry feature that would be very useful for monitoring this application.

https://github.com/joyent/containerpilot/issues/27 proposed the following gauge:

The count of MySQL Query entries from SHOW PROCESSLIST that are in any Waiting state. 0 is great. 1 or above can be trouble. 10 or more is probably critical.

There are other MySQL-specific stats that would be very useful in scaling decisions. How would we write those sensors?

tgross commented 8 years ago

Looks like we can get replication lag for the replicas via pt-heartbeat

misterbisson commented 7 years ago

@Smithx10 asked how to autoscale MySQL in https://github.com/autopilotpattern/mysql/issues/54. With telemetry implemented per this ticket (though the sensors still need to be defined), scaling will require two more pieces:

  1. configured thresholds at which to scale up or down
  2. a scheduler/supervisor that can apply those scaling rules

It's incredibly minimalistic, but I've been experimenting for the past few months with running docker-compose scale <service>=<count> via a recurring task (Jenkins or cron both work fine). I have to name all the services and their counts in that line, but that's pretty much all there is to supervision. If an instance of a service fails, that will bring it back up to healthy. If you log the activity and set alarms on the logging....

What I haven't done yet is to make the <count> dynamic based on telemetry data and scaling thresholds, but that would seem to be the next step. Of course, I plan to set some min and max values, but....

Smithx10 commented 7 years ago

After watching a few promcon presentations, would it make sense to use prometheus exporters and use a separate http call?

tgross commented 7 years ago

@neuroserve wrote in https://github.com/autopilotpattern/mysql/issues/58:

To enhance the setup, it might be a good idea to add Percona monitoring and management: https://www.percona.com/doc/percona-monitoring-and-management/index.html

It consists basically of two Docker containers and the pmm-client package, that needs to be installed and activated on the mysql servers. The pmm-server IP/name could be transferred via its cns name (similar to the consul name).

It delivers query analysis and a grafana based metrics monitor. The backend is prometheus.

tgross commented 7 years ago

@Smithx10 and @neuroserve we've provided the Prometheus endpoint in ContainerPilot so that we can use the same interface to capture metrics from arbitrary applications. What the end user does with those metrics afterwards (put graphana in front of Prometheus or pipe them out via an exporter to a different storage engine) is left intentionally agnostic.

misterbisson commented 7 years ago

With ContainerPilot 3's first-class support for multi-process containers, it probably makes more sense to implement the "official" MySQL exporter for Prometeheus.

Related: a fancy dashboard for Grafana for that data.