Easily monitor health of Signalen application

tcoenen commented 4 years ago

consider:

Prometheus compatible output
Sentry (presently configured)
More system health checks?

bartjkdp commented 4 years ago

One of my colleagues at VNG discovered that the logging of Signals to standard output / error seems to be incomplete. Exceptions are not logged by default. To fix this we should probably merge parts of this and this. This are commits on a branch he made with some quick-fixes.

Logging For logging we can maybe use parts of the VNG Haven setup. This uses Fluentd, Elasticsearch and ElastAlert to automatically aggregate container logs in a Kubernetes cluster.

Metrics For metrics I like the idea of using Prometheus / InfluxDB and Grafana, because they are open source.

Overall metrics For overall metrics I think we can export metrics at the edge proxy (NGINX, Traefik or HAProxy):

Number of requests
Endpoints
Request latency
Response codes

Readiness and health metrics Used by Kubernetes to determine if the service is ready and healthy.

Frontend
Backend
Machine learning

Application-specific metrics For application-specific metrics I like the idea of exposing a /metrics endpoint that exposes KPI's like:

The number of new signals
The number of closed signals

For us operating system health checks are out of scope. The underlying systems are abstracted away by the container platform and are managed by a different team but maybe we can also use their tooling for this.

bartjkdp commented 4 years ago

I think the monitoring requirements can be full-filled easily with external tooling and the current code. When there are specific changes required in Signalen we should open a specific issue for that.

Signalen / backend

Easily monitor health of Signalen application #20