Open MooseQuest opened 4 years ago
Going to try out this pre-rolled stack as a starting point: https://github.com/coreos/kube-prometheus
Even if all goes well, this doesn't get us a logging stack; just metrics and monitoring.
Roll out went well and we have metrics dashboards running at https://metrics.chime-live-cluster.phl.io/
The manifests used for the rollout are currently sitting in the issues/3 branch, where they will remain until the freeze on PR to masters is lifted.
@lottspot would we consider this completed?
We don't have anything capturing logs yet so this is technically not completed
I'll be pushing up what we have so far onto a branch and will reference here.
Would someone be interested in telling a non-devops person how this differs from #32 ?
Just going to leave a few comments here for posterity:
I had a conversation with @MooseQuest and he told me that Elasticserach was installed on the dev k8s cluster.
Elasticsearch was installed on a following the instructions here: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html
For reference the instructions here are for installing Elastic Cloud, which is a service for managing multiple elasticsearch deployments. This of this as https://cloud.elastic.co on prem. Meaning you will have a web interface for managing multiple es clusters. You can upgrade, manage backups, etc.. It's a great service but might be overkill to have an elastic cloud serice for each chime deployment
My recommendation is that for each deployment of CHIME it would have a single deployment of Elasticsearch.
To deploy Elasticsearch (and the elastic stack at large) I would recommend using the Elasticsearch Helm Charts
Elasticsearch Helm chart requirements are:
Elasticsearch being a distributed system operates on an high availability model. Meaning the minimum number of Elasticsearch nodes should be 3. This is why the kubernetes cluster must have at least 3 nodes. This allows for Elasticsearch cluster to survive a kubernetes node failure.
Using the helm charts also gives us the added benefit of being able to deploy:
elasticsearch
filebeat
metricbeat
kibana
apm-server
Filebeat can be configured to read the logs from pods in the k8s cluster and ship the logs to elasticsearch
metricbeat can be configured to collect metrics from the k8s cluster and ship them to elasticsearch
APM server is a service that runs on the k8s clusters and can accept APM data from various applications deployed in the K8s cluster and ship APM data to elasticserach.
the benefit of having all this data going into elasticsearch is that you can use Kibana to vizualize all these different data sources in one place.
Kibana also has a "logs" app which lets you tail incoming logs to elasticsearch. You can even filter on k8s labels or pod names or namespaces etc..
The elastic apm service currently has support the following languages
@fxdgear long term, we're not looking to give each deployment of CHIME its own cluster. That was a stop-gap measure to proceed quickly. Eventually, we want to have a single prod cluster hosting many civic applications including chime, alternate versions of chime, follow-up projects related to chime, and other local civic projects. We are thinking that each project would be within its own namespace.
We need an infrastructure that gets us as close as possible to each project/namespace being free-when-idle. Any cluster services that we need to deploy instances of per-project/namespace will create poor economics for us. We have very modest funding within which we need to be able to host a large number of low-traffic projects sustainably for many years. At any given time, only a small number of projects, if any, will have high traffic. It's kind of an inverse scenario of most enterprise use cases
Given that, would you adjust your recommendations at all?
@themightychris Thanks for the quick response.
Given the longterm goal of a single K8s cluster with multiple namespaces what I think I would recommend in this case is the following:
service-name.namespace.svc.cluster.local
The end goal here being (wrt the elastic stack) is that it's a single deployment of the elastic tooling. It's configured in a way that lets you add and remove namespaces (ie various CHIME related projects and deployments)
But you end up with a singular entity to monitor ALL your deployments.
This was not explicit in my previous comment, but the goal here is that a if you end up having multiple k8s clusters or a single k8s cluster you still only need a single elastic stack deployment per k8s cluster.
This strategy will scale regardless.
On another note, depending on volume of logs/metrics you may or may run out of disk space for storing data in Elasticsearch. There's a couple ways to handle this.
If you have a policy on the length of time you are required (or want) to store logs you can do any of the following:
Generating the observability stack serves the following purposes:
Components to generate:
Technologies and software to consider: