I-GUIDE / CI_Platform

iGUIDE CI Platform Deployment
Apache License 2.0
0 stars 0 forks source link

Enable Service Monitoring #5

Open fbaig opened 7 months ago

fbaig commented 7 months ago

Problem There is no mechanism to detect whether any of the I-GUIDE services is down, not accessible or having any issues. All current detection is done manually which is not optimal and needs to be automated to mitigate failures and increase efficiency.

Potential Solution

Pull Requests ToDo ...

rkalyanapurdue commented 7 months ago

ToDo:

  1. Create a detailed system deployment diagram, identifying services that need monitoring.
  2. Deploy Prometheus in the Kubernetes cluster on Jetstream2 and configure it for Jupyter & CVMFS monitoring.
  3. Identify a solution for OpenStack monitoring.
  4. Demonstrate Grafana views showing services & CI Platform usage metrics.
yirugi commented 7 months ago

MEMO for 2/27 hacking session

Prometheus

JupyterHub metrics

Ansible for Prometheus

Prometheus config for external K8s

Not sure if it works. It seems we need to config account, and may have SSL issue https://stackoverflow.com/questions/50587087/prometheus-outside-kubernetes-cluster

kubernetes_sd_configs:
- api_server: https://<ip>:6443
  role: node
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: true
  basic_auth:
      username: kube
      password: Superkube01
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

Grafana

Grafana Installation on Debian/Ubuntu

OpenStack monitor

CVMFS

CVMFS monitoring tool

Helm charts

yirugi commented 7 months ago

2/27 Hacking Report

Requirements:

What we have:

Progress:

Next Steps:

yirugi commented 7 months ago

Remote k8s connection to Prometheus can be done if we set "Service Account Tokens": https://iximiuz.com/en/posts/kubernetes-api-call-simple-http-client/

rkalyanapurdue commented 4 months ago

Consider using Zabbix and integrate with OpenStack + Kubernetes