coreos / fedora-coreos-releng-automation

Repo for small tools/scripts/sevices that aid in the automation of Fedora CoreOS release engineering
Other
6 stars 20 forks source link

Metrics and monitoring #5

Open jlebon opened 5 years ago

jlebon commented 5 years ago

We will soon have two services here: coreos-koji-tagger and config-bot. We need to figure out how we'll monitor these services and e.g. be alerted if they go down.

Right now coreos-koji-tagger is planned to run in the OpenShift infra. Would probably make things easier to implement if we just run config-bot there too.

See related discussions in https://github.com/coreos/fedora-coreos-releng-automation/pull/3#discussion_r301279231.

/cc @lucab since he has experience in that area (and I think was looking at this for the Cincinnati server as well?)

lucab commented 5 years ago

I think that coreos-koji-tagger, like the Cincinnati server, runs on Fedora infrastructure: https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/openshift-apps/coreos-koji-tagger.yml

To my knowledge, that Openshift cluster does not (yet) have Prometheus+AlertManager, but monitoring is done via Nagios.