FrostyX / fedora-review-service

Fedora package reviews CI
9 stars 1 forks source link

Monitoring and health checks #26

Open FrostyX opened 1 year ago

FrostyX commented 1 year ago

I periodically log into OpenShift, open a shell inside the fedora-review-service container, and check if the service is still running. This needs to be automatized somehow. Maybe nagios? What options do we have for Fedora Communishift?

mavjs commented 1 year ago

Hi,

I came across the "Fedora Review Service" while looking at a package review request ticket. (Also it seems Fedora Communishift is no longer alive? as per https://fedoraproject.org/wiki/Infrastructure/Communishift)

Anyhow, although I do not know what you do inside the shell of the container to check if the service is still running, I was wondering if a combination of such probes described here: https://docs.openshift.com/container-platform/4.13/applications/application-health.html could work for your use case, perhaps? (Not a kubernetes/openshift expert, however, I have seen examples of health checks in pods/containers, and thought this could work as well.)

FrostyX commented 1 year ago

Hello @mavjs,

It is possible I confused the OpenShift instance. Anyway, it is this one:

I was wondering if a combination of such probes described here: https://docs.openshift.com/container-platform/4.13/applications/application-health.html could work for your use case, perhaps?

I hope so, this looks good. But we need to figure out what commands to run to verify if the service is healthy. I think maybe something to:

  1. See if the fedora-review-process is running
  2. Check if Copr/Bugzilla/Pagure tokens are not expired
  3. Check if there are reasonably recent entries in the fedora-review-service.log file
  4. Mabe something else?
FrostyX commented 1 day ago

To get a Nagios monitoring from Fedora Infra, we will need to migrate from CommuniShift to the Fedora production OpenShift instance.

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ST4TOGK2OHLOKGVXE2JCSX4XS6NXQBDP/#ST4TOGK2OHLOKGVXE2JCSX4XS6NXQBDP

We want to do that regarless, but the move isn't trivial. We will have to move our YAML files to the ansible repository, etc.

On an unrelated note, I at least configured Sentry monitoring, so that I get notified about new errors.