envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway
https://gateway.envoyproxy.io
Apache License 2.0
1.51k stars 325 forks source link

Enriching the Prometheus metric labels emitted by Envoy from xRoute labels #2488

Open ardikabs opened 7 months ago

ardikabs commented 7 months ago

Description: Although metrics emitted by the envoy instance look good, it shows a very limited label in it, for example:

envoy_cluster_upstream_rq_time_bucket{envoy_cluster_name="httproute/<HTTPRoute_Namespace>/<HTTPRoute_Name>/rule/0",le="0.5"} 11

Subsequently, wonder about the feasibility of adding more valuable information, such as the labels from xRoute resources directly into the metric.

An example of this could be the application of the following HTTPRoute manifest:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
  namespace: envoy-gateway-system
  labels:
    host: www.example.com
    tribe: virtual-product
    squad: game-voucher
spec:
  hostnames:
  - www.example.com
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: default-gateway
    namespace: envoy-gateway-system
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: echoserver
      namespace: default
      port: 80
    matches:
    - path:
        type: PathPrefix
        value: /

As a result, the Envoy-emitted metric would appear as shown below:

envoy_cluster_upstream_rq_time_bucket{envoy_cluster_name="httproute/envoy-gateway-system/example/rule/0",host="www.example.com",tribe="virtual-product",squad="game-voucher",le="0.5"} 11

[optional Relevant Links:]

Any extra documentation required to understand the issue.

arkodg commented 7 months ago

hey @ardikabs can you share a little more info on the intent ? does this allow specific teams/apps to concentrate/filter on their relevant metrics ?

ardikabs commented 7 months ago

hi @arkodg , here are the objectives:

  1. As you pointed out, regarding ownership of specific routes for specific teams. Without the necessary labels, resolving this requires manual filtering by route name owned by specific teams. Incorporating these labels will address two scenarios, namely monitoring and alerting.
  2. Our existing setup allows different hosts to run on the same envoy, but there is currently no intuitive way to determine which metric corresponds to which host.

But please enlighten me, whether this can actually be done at the moment.

arkodg commented 7 months ago

reg 1. before we decide to copy input labels to output, would default labels such as httproute=<name> help ?

reg2. have you tried enabling enableVirtualHostStats https://gateway.envoyproxy.io/v0.6.0/api/extension_types/#proxymetrics , these stats are per virtual host and EG has a 1:1 mapping b/w virtual host and hostname . Would be good to also highlight this in our dashboard https://gateway.envoyproxy.io/latest/user/grafana-integration/ cc @zirain

zirain commented 7 months ago

EG can provide mechanism support this, but IMO should be default off.

ardikabs commented 7 months ago

@arkodg

reg 1. before we decide to copy input labels to output, would default labels such as httproute= help ?

I would say for my use case, doesn't seem to help. I mean, what if add an opt-in possibility for the user to add additional labels to the metric, for example, from the xRoute labels itself?

reg2. have you tried enabling enableVirtualHostStats https://gateway.envoyproxy.io/v0.6.0/api/extension_types/#proxymetrics , these stats are per virtual host and EG has a 1:1 mapping b/w virtual host and hostname. Would be good to also highlight this in our dashboard

I tried this, but it doesn't appear to align with my need, because I'm expecting any metric associated to HTTPRoute information should include additional details, because we can see in HTTPRoute spec also specify hostname. However, I would appreciate clarification if I have misunderstood this understanding.

zetaab commented 6 months ago

and actually enableVirtualHostStats does not contain metric for instance duration buckets

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days.