cloudfoundry / cf-k8s-networking

building a cloud foundry without gorouter....
Apache License 2.0
32 stars 17 forks source link

[BUG] Not all Istio component metrics are available via Prometheus #55

Closed heycait closed 4 years ago

heycait commented 4 years ago

Summary

A Prometheus server deployed to cf-system can't scrape all Istio component metrics.

Using the following NetworkPolicy:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: istio-system
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          cf-for-k8s.cloudfoundry.org/cf-system-ns: ""
      podSelector:
        matchLabels:
          what-am-i: prometheus

And adding the following annotations to all the Istio deployments (istio-citadel, istio-galley, istio-pilot, etc):

prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "15014"

Only metrics for citadel, pilot, and sidecar-injector are exposed. Metrics such as:

Metrics for galley, telemetry, and policy do not work

Deployment Configuration

[Describe any other special configuration here]

Reproduction Steps

What steps/actions led to the issue? Wanted to check if istio component metrics were scrapeable via Prometheus UI if I added the appropriate annotations.

Logs

It's helpful to include snippets of the error response or logs output The failing components (galley, telemetry, and policy) show an error of server returned HTTP status 503 Service Unavailable when Prometheus tries to scrape it.

Expected behavior

up{kubernetes_namespace="istio-system"} query in Prometheus UI shows a successfully scrape of 1 for all istio components.

Additional context

I based the Prometheus annotations off of these files which mention the port 15014:

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/173757860

The labels on this github issue will be updated when the story is started.

jenspinney commented 4 years ago

Thanks for reporting this @heycait! We'll need to explore a little to understand why metrics aren't working on those particular components.

KauzClay commented 4 years ago

Hey @heycait ,

We are in the process of upgrading to istio 1.6.x, which no longer has separate pods for galley, telemetry, and policy. It just has the single istiod pod.

We still need to validate metrics from this new pod, but in the meantime, we wouldn't worry about getting metrics from galley, telemetry, or policy.

kauana commented 4 years ago

Hello @heycait :wave: ,

We upgraded istio to 1.6.4 and went through your reproduction steps and did not see any metrics for galley, telemetry and policy. This is because these components were all merged into a single component called istiod.

Here is a picture of what up{kubernetes_namespace="istio-system"} looks like in Istio 1.6.4: prometheus