Open misterbisson opened 8 years ago
Looks like we can get replication lag for the replicas via pt-heartbeat
@Smithx10 asked how to autoscale MySQL in https://github.com/autopilotpattern/mysql/issues/54. With telemetry implemented per this ticket (though the sensors still need to be defined), scaling will require two more pieces:
It's incredibly minimalistic, but I've been experimenting for the past few months with running docker-compose scale <service>=<count>
via a recurring task (Jenkins or cron both work fine). I have to name all the services and their counts in that line, but that's pretty much all there is to supervision. If an instance of a service fails, that will bring it back up to healthy. If you log the activity and set alarms on the logging....
What I haven't done yet is to make the <count>
dynamic based on telemetry data and scaling thresholds, but that would seem to be the next step. Of course, I plan to set some min and max values, but....
After watching a few promcon presentations, would it make sense to use prometheus exporters and use a separate http call?
@neuroserve wrote in https://github.com/autopilotpattern/mysql/issues/58:
To enhance the setup, it might be a good idea to add Percona monitoring and management: https://www.percona.com/doc/percona-monitoring-and-management/index.html
It consists basically of two Docker containers and the pmm-client package, that needs to be installed and activated on the mysql servers. The pmm-server IP/name could be transferred via its cns name (similar to the consul name).
It delivers query analysis and a grafana based metrics monitor. The backend is prometheus.
@Smithx10 and @neuroserve we've provided the Prometheus endpoint in ContainerPilot so that we can use the same interface to capture metrics from arbitrary applications. What the end user does with those metrics afterwards (put graphana in front of Prometheus or pipe them out via an exporter to a different storage engine) is left intentionally agnostic.
With ContainerPilot 3's first-class support for multi-process containers, it probably makes more sense to implement the "official" MySQL exporter for Prometeheus.
ContainerPilot 2.0 introduced a telemetry feature that would be very useful for monitoring this application.
https://github.com/joyent/containerpilot/issues/27 proposed the following gauge:
There are other MySQL-specific stats that would be very useful in scaling decisions. How would we write those sensors?