GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
195 stars 92 forks source link

[bug] Unable to specify endpoints with the same port name #479

Open Zebradil opened 1 year ago

Zebradil commented 1 year ago

I convert a ServiceMonitor resource rendered from a helm chart into a PodMonitoring resource according to the GMP documentation.

The initial ServceMonitor resource

```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: # Removed #annotations: labels: app.kubernetes.io/component: keycloak app.kubernetes.io/instance: keycloak app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: keycloak helm.sh/chart: keycloak-15.1.1 name: keycloak namespace: keycloak spec: endpoints: - interval: 30s path: /metrics port: http - interval: 30s path: /realms/master/metrics port: http namespaceSelector: matchNames: - keycloak selector: matchLabels: app.kubernetes.io/component: metrics app.kubernetes.io/instance: keycloak app.kubernetes.io/name: keycloak ```

The resulting PodMonitoring resource

```yaml apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: # Removed #annotations: labels: app.kubernetes.io/component: keycloak app.kubernetes.io/instance: keycloak app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: keycloak helm.sh/chart: keycloak-15.1.1 name: keycloak namespace: keycloak spec: endpoints: - interval: 30s path: /metrics port: http - interval: 30s path: /realms/master/metrics port: http selector: matchLabels: app.kubernetes.io/instance: keycloak app.kubernetes.io/name: keycloak ```

The difference

```diff --- servicemonitor-keycloak.yaml 2023-05-19 13:43:45.708860839 +0200 +++ podmonitoring-keycloak.yaml 2023-05-19 13:47:15.243825855 +0200 @@ -1,31 +1,27 @@ -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor +apiVersion: monitoring.googleapis.com/v1 +kind: PodMonitoring metadata: # Removed #annotations: labels: app.kubernetes.io/component: keycloak app.kubernetes.io/instance: keycloak app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: keycloak helm.sh/chart: keycloak-15.1.1 name: keycloak namespace: keycloak spec: endpoints: - interval: 30s path: /metrics port: http - interval: 30s path: /realms/master/metrics port: http - namespaceSelector: - matchNames: - - keycloak selector: matchLabels: - app.kubernetes.io/component: metrics app.kubernetes.io/instance: keycloak app.kubernetes.io/name: keycloak ```

After applying the PodMonitoring resource to my environment I see the following errors in collector pods:

prometheus ts=2023-05-19T11:36:08.511Z caller=main.go:1198 level=info msg="Loading configuration file" filename=/prometheus/config_out/config.yaml
prometheus ts=2023-05-19T11:36:08.514Z caller=main.go:929 level=error msg="Error reloading config" err="couldn't load configuration (--config.file=\"/prometheus/config_out/config.yaml\"): parsing YAML file /prometheus/config_out/config.yaml: found multiple scrape configs with job name \"PodMonitoring/keycloak/keycloak/http\""
config-reloader level=error ts=2023-05-19T11:36:18.51061949Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:19090/-/reload\": context deadline exceeded"
config-reloader level=error ts=2023-05-19T11:36:18.514124586Z caller=reloader.go:382 msg="Failed to trigger reload. Retrying." err="trigger reload: reload request failed: Post \"http://localhost:19090/-/reload\": context deadline exceeded"
config-reloader level=error ts=2023-05-19T11:36:18.517247779Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: received non-200 response: 500 Internal Server Error; have you set `--web.enable-lifecycle` Prometheus flag?"

The most interesting here seems to be this error:

found multiple scrape configs with job name "PodMonitoring/keycloak/keycloak/http"

I assume that the logic of generating job names doesn't consider endpoint paths, and there must be no endpoints with the same port name. Is that correct?

As a workaround, I'll create one PodMonitoring resource per port.

Zebradil commented 1 year ago

I dug around a bit and found out that the upstream Prometheus appends an endpoint's index to a job_name: https://github.com/prometheus-operator/prometheus-operator/blob/b5327e7e66e818b8b924332191d4888d45d270e4/pkg/prometheus/promcfg.go#L1075

Maybe it makes sense to do the same?

pintohutch commented 1 year ago

Hi @Zebradil - thanks for reporting.

Can you detail your use case here a bit? Are you doing closed-box monitoring, e.g. something like blackbox-exporter in Prometheus?

Or is your target the source of different-flavored metrics, serving them on different paths of its metrics HTTP port?

If it's the former, I'm tempted to mark this as a duplicate to #91, where we're exploring the best way to support closed-box monitoring in managed collection.

A current workaround for this is as you said, creating a separate PodMonitoring per path, as mentioned here. You could also deploy a standalone GMP collector and manually configure scraping as mentioned here.

We have to be careful here as changing the job_name to append the endpoint index can break existing users' dashboards that rely on the job label matching the name of their PodMonitoring. As such, if we wanted to pursue your change, we'd likely want to bump the API version of the CRD as it can be considered breaking.

Does that make sense?

cc @bwplotka

pintohutch commented 1 year ago

Actually this may be safe as we use relabeling rules to overwrite the PodMonitoring's name as the job label in the operator. Taking another look...

pintohutch commented 1 year ago

Yea - so the biggest risk with relabeling both metric paths to the same job is potential collisions if the same metric name and label set exists at both paths, which isn't great.

The same problem could arise if we, say, injected the endpoint index, or path as a separate label, e.g. endpoint_index:0, or endpoint_path:/realms/master/metrics.

Otherwise, we'd need to change the job label, which gets back to the breaking API change I mentioned above.

Zebradil commented 1 year ago

Hi @pintohutch, thank you for looking into that.

Can you detail your use case here a bit?

I'm installing a Keycloack application from this helm chart. The helm chart supports Prometheus metrics and includes a ServiceMonitor resource as shown in the first comment. It is not the first time I have needed to convert a ServiceMonitor to a PodMonitoring, so I'm doing it with a small script, which follows the conversion process described in the GCP docs. But this time the conversion produced an unusable PodMonitoring resource due to duplicated job names.

I was checking the upstream Prometheus to see how such a case is handled there. There, scraping job names include indexes (e.g. ServiceMonitor/ns1/name1/0, ServiceMonitor/ns1/name1/1, see the implementation).

In this project, because of the relabeling, the generated job names aren't affecting the final metrics. I assumed that including indexes in a job_name won't affect anything. You can check my PR #480 for more details.

Yea - so the biggest risk with relabeling both metric paths to the same job is potential collisions if the same metric name and label set exists at both paths, which isn't great.

This is not great, agreed. I missed the point that in the upstream the job label is not relabeled by default.

The same problem could arise if we, say, injected the endpoint index, or path as a separate label, e.g. endpoint_index:0, or endpoint_path:/realms/master/metrics.

I don't get what would be the downside of adding an endpoint_index label.
An endpoint_path label probably isn't a good choice, because an endpoint can emit different metrics depending on query parameters.

To summarize, I see that the proposed by me approach to naming jobs can solve my issue, but at the same time, it opens a possibility of getting conflicts between metric names-labels emitted by different endpoints, and makes it harder to notice the conflict. The requirement of having unique endpoint ports per PodMonitoring, on the other side, ensures no naming conflicts. Adding a new endpoint_index label seems to be a good solution (until I understand why it is not 🙂).

Zebradil commented 1 year ago

BTW, it'd be nice to have a validation of a PodMonitoring custom resource for such a case. I spent some time looking around before I noticed errors in the collector's logs. Ideally, a PodMonitoring resource with endpoints with the same port should not be accepted.

bwplotka commented 1 year ago

+1 for validation, great point.

And great to see another use case for this. Indeed some apps/exporters might require multiple scrapes due to different paths (although rare).

I think we need a small proposal for the path forward - we will look into it (we will share here). Either more supportive job names (e.g like Prometheus Operator), or more dynamic configuration for job name. Perhaps this can solve https://github.com/GoogleCloudPlatform/prometheus-engine/issues/91 in mid term too.

bwplotka commented 1 year ago

Is there a way to separate to different PodMonitors for now for you?

Zebradil commented 1 year ago

Hi @bwplotka,

Yes, at the moment I just manually create two PodMonitoring resources with different names. This is not ideal, because it increases the effort of maintaining compatibility of the upstream helm chart and its adaptation for GMP, but I'm not blocked and it's good enough.

I'm looking forward to the outcome of your discussions.

pintohutch commented 1 year ago

Cool - we should definitely add a validating webhook check for this case, thanks for raising!

I don't get what would be the downside of adding an endpoint_index label.

This is probably rare and maybe not that much of a concern, but I was thinking how to handle if the target already emitted timeseries with an endpoint_index label.

Assuming we'd use metric_relabel_configs, and since we leave honor_labels=false, Prometheus would relabel the target's label to endpoint_index_exported, and use our endpoint_index label as the canonical one.

In that case, if that label is used at all in dashboarding or alerting, it could cause issues.

Though, we may be able to caveat that and not overwrite if it's already present, e.g:

# Prime the endpoint_index assuming we can add it safely.
- target_label: __tmp_endpoint_index
  replacement: 0 # <- determined from index in the PodMonitoring
# Overwrite it with any existing label.
- source_labels: [endpoint_index]
  target_label: __tmp_endpoint_index
# Final relabeling.
- source_labels: [_tmp_endpoint_index]
  target_label: endoint_index

But I'm unsure if that's causing more or less problems 😅 .

m3adow commented 1 year ago

Just stumbled upon this issue while migrating a RabbitMQ ServiceMonitor to a PodMonitoring. Was there any progress since end of May? As a reference, this is the ServiceMonitor spec.endpoints:

  - interval: 15s                                                                                                                                  
    port: prometheus                                                                                                                               
    scheme: http                                                                                                                                   
    scrapeTimeout: 1s                                                                                                                                                                                                                                             
  - interval: 15s                                                                                                                                  
    params:                                                                                                                                        
      family:                                                                                                                                      
      - queue_coarse_metrics                                                                                                                       
      - queue_metrics                                                                                                                              
    path: /metrics/detailed                                                                                                                        
    port: prometheus                                                                                                                               
    scheme: http                                                                                                                                   
    scrapeTimeout: 1s

GMP rightfully complains about identically named job_name keys.

Is the best workaround still to create different PodMonitorings?

pintohutch commented 1 year ago

Hi @m3adow - yes indeed the best workaround for now is to use different PodMonitorings.

We haven't had a lot of recent demand for this feature so we've deprioritized it for now. If you have some use cases you'd like to promote here, please let us know.