canonical / prometheus-k8s-operator

https://charmhub.io/prometheus-k8s
Apache License 2.0
21 stars 35 forks source link

prometheus_scrape causes log spam when related to a machine charm #422

Closed simondeziel closed 1 year ago

simondeziel commented 1 year ago

Bug Description

If a machine charm uses the MetricsEndpointProvider from charms.prometheus_k8s.v0.prometheus_scrape, the prometheus lib will keep complaining about having no container present in metadata.yaml.

To Reproduce

  1. deploy machine charm using MetricsEndpointProvider from charms.prometheus_k8s.v0.prometheus_scrape
  2. relate the machine charm with prometheus-k8s
  3. observe log spam on the deployed unit using the machine charm

Environment

$ juju version
3.0.3-genericlinux-arm64
$ juju status -m cos
Model  Controller  Cloud/Region            Version  SLA          Timestamp
cos    overlord    microk8s-cos/localhost  3.0.2    unsupported  22:12:55Z

App           Version  Status  Scale  Charm             Channel  Rev  Address         Exposed  Message
alertmanager  0.23.0   active      1  alertmanager-k8s  edge      38  10.152.183.20   no       
catalogue              active      1  catalogue-k8s     edge       6  10.152.183.211  no       
grafana       9.2.1    active      1  grafana-k8s       edge      59  10.152.183.118  no       
loki          2.4.1    active      1  loki-k8s          edge      49  10.152.183.135  no       
prometheus    2.33.5   active      1  prometheus-k8s    edge      92  10.152.183.89   no       
traefik                active      1  traefik-k8s       edge     100  172.17.33.1     no       

Unit             Workload  Agent  Address       Ports  Message
alertmanager/0*  active    idle   10.1.134.76          
catalogue/0*     active    idle   10.1.134.113         
grafana/0*       active    idle   10.1.134.101         
loki/0*          active    idle   10.1.134.81          
prometheus/0*    active    idle   10.1.134.79          
traefik/0*       active    idle   10.1.134.85          

Offer                            Application   Charm             Rev  Connected  Endpoint              Interface                Role
alertmanager-karma-dashboard     alertmanager  alertmanager-k8s  38   0/0        karma-dashboard       karma_dashboard          provider
grafana-dashboards               grafana       grafana-k8s       59   0/0        grafana-dashboard     grafana_dashboard        requirer
loki-logging                     loki          loki-k8s          49   0/0        logging               loki_push_api            provider
prometheus-receive-remote-write  prometheus    prometheus-k8s    92   0/0        receive-remote-write  prometheus_remote_write  provider
prometheus-scrape                prometheus    prometheus-k8s    92   1/1        metrics-endpoint      prometheus_scrape        requirer

Relevant log output

This gets repeated over and over:

unit-lxd-0: 22:03:31 WARNING unit.lxd/0.juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.


### Additional context

_No response_
sed-i commented 1 year ago

The warning is legit: in this case the user needs to be explicit about the refresh event. Without an explicit refresh event, prometheus won't be able to keep track of any changes to IP.

On one hand, it isn't ERROR because it will work fine; on the other hand, INFO/DEBUG isn't drawing sufficient attention to the potential problem.

simondeziel commented 1 year ago

Oh, that might explain why scrape_jobs gets reset pretty quickly. Any suggestion as to what kind of event I should be passing to refresh_event for a machine charm?

sed-i commented 1 year ago

Not sure. I suppose passing both start and upgrade-charm should cover it?

simondeziel commented 1 year ago

Thanks, that's much appreciated, let me try that out!

sed-i commented 1 year ago

Actually from the the docs it seems that if you use start you don't need upgrade-charm as well.

simondeziel commented 1 year ago

I appreciate your diligence! Unfortunately, that doesn't fix the problem of scrape_jobs being reset "behind" by back.

That said, setting refresh_event=self.on.start makes the log spam stop, so I'll close the bug as I was the lib wrong apparently.