charmed-lma / charm-k8s-prometheus

Kubernetes Operator for Prometheus
Apache License 2.0
4 stars 2 forks source link

Add remote client configuration using the http-api relation #3

Closed exceptorr closed 4 years ago

relaxdiego commented 4 years ago

Something weird is happening after running juju config prometheus advertised-port=9001 with this code. From Juju's perspective, everything SEEMS fine as you can see via the logs and juju status below:

application-prometheus: 01:38:00 DEBUG unit.prometheus/0.juju-log on_config_changed_handler: <ops.charm.ConfigChangedEvent object at 0x7fe6fe3e6da0>
application-prometheus: 01:38:00 DEBUG unit.prometheus/0.juju-log build_pod_spec
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log Building Juju pod spec
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log {'containers': [{'name': 'prometheus', 'imageDetails': {'imagePath': 'prom/prometheus:v2.18.1', 'username': '', 'password': ''}, 'ports': [{'containerPort': 9001, 'protocol': 'TCP'}], 'readinessProbe': {'httpGet': {'path': '/-/ready', 'port': 9001}, 'initialDelaySeconds': 10, 'timeoutSeconds': 30}, 'livenessProbe': {'httpGet': {'path': '/-/healthy', 'port': 9001}, 'initialDelaySeconds': 30, 'timeoutSeconds': 30}, 'files': [{'name': 'config', 'mountPath': '/etc/prometheus', 'files': {'prometheus.yml': "global:\n  external_labels: {}\n  scrape_interval: 15s\nscrape_configs:\n- job_name: prometheus\n  scrape_interval: 5s\n  static_configs:\n  - targets: ['localhost:9001']\n"}}]}]}
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log Configuring pod
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log Checking k8s pod readiness
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log GET kubernetes.default.svc//api/v1/namespaces/lma/pods?labelSelector=juju-app=prometheus
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log Received k8s pod status: <adapters.k8s.PodStatus object at 0x7fe6fe3988d0>
application-prometheus: 01:38:01 DEBUG unit.prometheus/0.juju-log Built unit status: <ops.model.ActiveStatus object at 0x7fe6fe398978>
Model  Controller  Cloud/Region        Version  SLA          Timestamp
lma    mk8s        microk8s/localhost  2.7.6    unsupported  01:41:32Z

App         Version  Status   Scale  Charm       Store  Rev  OS          Address         Notes
prometheus           waiting      1  prometheus  local    0  kubernetes  10.152.183.176

Unit           Workload  Agent  Address    Ports     Message
prometheus/0*  active    idle   10.1.3.34  9001/TCP  Unit is ready

BUT, when inspecting it from the k8s layer, this is what I get:

$ microk8s.kubectl -n lma get pods
NAME                    READY   STATUS    RESTARTS   AGE
prometheus-0            0/1     Running   0          28s
prometheus-operator-0   1/1     Running   0          20m

Notice that prometheus-0 READY is 0/1 which means the pod is not ready to receive requests. Running kubectl describe yields the following:

$ microk8s.kubectl -n lma describe pod prometheus-0
Name:         prometheus-0
Namespace:    lma
Priority:     0
Node:         dev-18-04-2/172.16.188.131
Start Time:   Wed, 20 May 2020 01:38:05 +0000
Labels:       controller-revision-hash=prometheus-645c6df44d
              juju-app=prometheus
              statefulset.kubernetes.io/pod-name=prometheus-0
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: e64c4d29-c9ae-4ed7-8c4f-9aaf55324e65
              juju.io/model: 067696a4-2e5f-4c8a-876e-9c067b84ad4f
              juju.io/unit: prometheus/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.3.34
IPs:
  IP:           10.1.3.34
Controlled By:  StatefulSet/prometheus
Init Containers:
  juju-pod-init:
    Container ID:  containerd://adcb0c7acc53b9717f807aa2ce35bd5a1d6aa619429c9d8592cacf0c64011e38
    Image:         jujusolutions/jujud-operator:2.7.6
    Image ID:      docker.io/jujusolutions/jujud-operator@sha256:866eccf1961604db3a4710142f77632519c6627da7ebdebd26c8c17c03df48e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 20 May 2020 01:38:06 +0000
      Finished:     Wed, 20 May 2020 01:38:09 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cwgnw (ro)
Containers:
  prometheus:
    Container ID:   containerd://9ee321f8f170810791010e475937f962118008fa1b773ba880114e7b711ff63e
    Image:          prom/prometheus:v2.18.1
    Image ID:       docker.io/prom/prometheus@sha256:5880ec936055fad18ccee798d2a63f64ed85bd28e8e0af17c6923a090b686c3d
    Port:           9001/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 20 May 2020 01:42:26 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 20 May 2020 01:41:36 +0000
      Finished:     Wed, 20 May 2020 01:42:26 +0000
    Ready:          False
    Restart Count:  5
    Liveness:       http-get http://:9001/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
    Readiness:      http-get http://:9001/-/ready delay=10s timeout=30s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /etc/prometheus from prometheus-config-config (rw)
      /prometheus from database-79099c77 (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cwgnw (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  database-79099c77:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  database-79099c77-prometheus-0
    ReadOnly:   false
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  prometheus-config-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-config-config
    Optional:  false
  default-token-cwgnw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-cwgnw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                  Message
  ----     ------     ----                   ----                  -------
  Normal   Scheduled  5m6s                   default-scheduler     Successfully assigned lma/prometheus-0 to dev-18-04-2
  Normal   Pulled     5m5s                   kubelet, dev-18-04-2  Container image "jujusolutions/jujud-operator:2.7.6" already present on machine
  Normal   Created    5m5s                   kubelet, dev-18-04-2  Created container juju-pod-init
  Normal   Started    5m5s                   kubelet, dev-18-04-2  Started container juju-pod-init
  Normal   Pulled     4m5s (x2 over 5m1s)    kubelet, dev-18-04-2  Container image "prom/prometheus:v2.18.1" already present on machine
  Normal   Created    4m5s (x2 over 5m1s)    kubelet, dev-18-04-2  Created container prometheus
  Normal   Started    4m5s (x2 over 5m1s)    kubelet, dev-18-04-2  Started container prometheus
  Normal   Killing    4m5s                   kubelet, dev-18-04-2  Container prometheus failed liveness probe, will be restarted
  Warning  Unhealthy  3m18s (x9 over 4m48s)  kubelet, dev-18-04-2  Readiness probe failed: Get "http://10.1.3.34:9001/-/ready": dial tcp 10.1.3.34:9001: connect: connection refused
  Warning  Unhealthy  5s (x17 over 4m25s)    kubelet, dev-18-04-2  Liveness probe failed: Get "http://10.1.3.34:9001/-/healthy": dial tcp 10.1.3.34:9001: connect: connection refused

Both the Liveness and Readiness probes are failing and true enough, Prometheus is not accessible when I try to browse to it.

relaxdiego commented 4 years ago

Ah! I get it now. I changed the advertised-port from from 9090 to 9001 but we didn't actually tell the prometheus process in the container to listen to that port thus the failed liveness and readiness probes. I don't think it's necessary for anyone to change the listen-address of prometheus ever so I'm just going to remove that option in a future PR. There are more important things to deal with lol.

relaxdiego commented 4 years ago

So trying out this branch with:

juju config prometheus external-labels='{ "cluster": "datacenter1" }'

Results in a successful update of the prometheus config file:

$ microk8s.kubectl -n lma exec prometheus-0 -- cat /etc/prometheus/prometheus.yml
global:
  external_labels: {cluster: datacenter1}
  scrape_interval: 15s
scrape_configs:
- job_name: prometheus
  scrape_interval: 5s
  static_configs:
  - targets: ['localhost:9090']
exceptorr commented 4 years ago

Something weird is happening after running juju config prometheus advertised-port=9001 with this code. From Juju's perspective, everything SEEMS fine as you can see via the logs and juju status below:

I've also seen this, so I filed a https://github.com/relaxdiego/charm-k8s-prometheus/issues/8 issue. However, I am not sure if we really need to change the Prometheus listening port? Although it would be nice to have, I can't imagine the scenario why it might be required in reality.

relaxdiego commented 4 years ago

No we don't need to actually change the advertised port ever. Let's avoid doing that. I'm going to push a PR later hardcoding 9090 and not making it a configuration option. But first, I will review this PR one last time before merging as well as finish #4 before I do that.