Stackdriver / stackdriver-prometheus-sidecar

A sidecar for the Prometheus server that can send metrics to Stackdriver.
https://cloud.google.com/monitoring/kubernetes-engine/prometheus
Apache License 2.0
121 stars 43 forks source link

msg="target not found" for standard kube-state-metrics #279

Open forestoden opened 3 years ago

forestoden commented 3 years ago

I am trying to set up the stackdriver-prometheus-sidecar to push a few CronJob/Job metrics from kube-state-metrics to Stackdriver. I'm running into an issue where no matter what I do, all of the metrics report

level=debug ts=2021-04-06T22:10:39.947Z caller=series_cache.go:369 component="Prometheus reader" msg="target not found" labels="{__name__=\"kube_cronjob_next_schedule_time\",container=\"kube-state-metrics\",cronjob=\"cronjob\",endpoint=\"http\",instance=\"10.8.6.2:8080\",job=\"kube-state-metrics\",namespace=\"production\",pod=\"kube-prometheus-stack-kube-state-metrics-bbf56d7f5-dss8c\",service=\"kube-prometheus-stack-kube-state-metrics\"}"

Here is my config for the sidecar:

  - args:
    - --stackdriver.project-id=<project>
    - --prometheus.wal-directory=/prometheus/wal
    - --stackdriver.kubernetes.location=us-central1
    - --stackdriver.kubernetes.cluster-name=<cluster>
    - --include=kube_cronjob_next_schedule_time{namespace="production"}
    - --log.level=debug
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.8.2

I am using the Prometheus operator, with Prometheus version 2.18. I tried a couple different versions (up to 2.22) with no luck.

I am not seeing any metrics get to Stackdriver, I've tried adding --stackdriver.store-in-files-directory=/prometheus/sd and see a file get created but nothing is written to it, so it doesn't seem like a permissions issue there.

For the --include flag, I've tried a number of different ways with no luck.

I found #104 which highlights a similar log message but I think that use case is a bit more complex than this

forestoden commented 3 years ago

I dug into the code a bit and determined what the issue is but I'm not sure how it could be fixed given how the code works today.

The issue stems from the target look up, and getting a target from the Cache. We make a call

    t, _ := targetMatch(ts, lset)

that attempts to "return the first target in the entry that matches all labels of the input set iff it has them set." Prometheus targets have a namespace label. For kube-state-metrics deployments, in most cases, this namespace will not be the same as the workloads that it monitors. This leads you to a scenario where targetMatcher is going to iterate over a list of targets that match job and instance labels of the metric and check that all labels match and it fails to match on namespace because kube-state-metrics is not in the same namespace as the workload.

I have fixed this by just deploying kube-state-metrics in my production namespace as that covers my use-case. This is almost certainly not viable for all cases, for example, deploying a workload per namespace would make this tricky as you'd have to deploy multiple kube-state-metrics. Filtering out namespaces from targetMatch seems hacky so I'm hesitant to suggest that.

vmcalvo commented 3 years ago

I have had the same problem with this sidecar and kube-state-metrics, in my case the only solution I have found is to modify the Prometheus ServiceMonitor (I am using https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/templates )

The serviceMonitor that it generates for the kube-state-metrics metrics scrape takes a literal value for honorLabels of true: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/exporters/kube-state-metrics/serviceMonitor.yaml

Changing it to false I get that in the face of the label namespace conflict it generates 2 labels:

I have not reviewed all the metrics but I suppose that some will exceed the 10 labels because of this, perhaps in such cases a relabeling can be performed to delete the labels that I do not need.

jinnovation commented 2 years ago

Building on @forestoden and @vmcalvo's findings, my recent comment in #229 might be relevant as well.