Open tcoenen opened 4 years ago
One of my colleagues at VNG discovered that the logging of Signals to standard output / error seems to be incomplete. Exceptions are not logged by default. To fix this we should probably merge parts of this and this. This are commits on a branch he made with some quick-fixes.
Logging For logging we can maybe use parts of the VNG Haven setup. This uses Fluentd, Elasticsearch and ElastAlert to automatically aggregate container logs in a Kubernetes cluster.
Metrics For metrics I like the idea of using Prometheus / InfluxDB and Grafana, because they are open source.
Overall metrics For overall metrics I think we can export metrics at the edge proxy (NGINX, Traefik or HAProxy):
Readiness and health metrics Used by Kubernetes to determine if the service is ready and healthy.
Application-specific metrics
For application-specific metrics I like the idea of exposing a /metrics
endpoint that exposes KPI's like:
For us operating system health checks are out of scope. The underlying systems are abstracted away by the container platform and are managed by a different team but maybe we can also use their tooling for this.
I think the monitoring requirements can be full-filled easily with external tooling and the current code. When there are specific changes required in Signalen we should open a specific issue for that.
consider: