gocrane / crane

Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easier but also ensure the quality of applications.
https://gocrane.io
Apache License 2.0
1.85k stars 378 forks source link

prometheus-adapter-config only configure rules and externalRules, Crane will crash #829

Closed aheizi closed 1 year ago

aheizi commented 1 year ago

Describe the bug

When using Crane to dynamically load prometheus-adapter-config, if you only configure rules and externalRules, Crane will crash.

Configuration information:

rules:
- metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[30s])) by (<<.GroupBy>>)
  name:
    as: ${1}_qps
    matches: (.*)_total
  resources:
    overrides:
      namespace:
        resource: namespace
      pod:
        resource: pod
  seriesQuery: '{__name__=~"^http_requests.*_total$",container!="POD",namespace!="",pod!=""}'
externalRules:
- metricsQuery: avg(rate(<<.Series>>{<<.LabelMatchers>>}[30s])) by (name)
  resources:
    namespaced: false
  seriesQuery: 'http_requests_total'

Error message:

I0629 22:30:41.116099       1 config_fetcher.go:42] Got prometheus adapter configmap crane-system/prometheus-adapter
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x161a4d9]

goroutine 474 [running]:
github.com/gocrane/crane/pkg/prometheus-adapter.ParsingResourceRules(...)
    /go/src/github.com/gocrane/crane/pkg/prometheus-adapter/expression.go:65
github.com/gocrane/crane/pkg/prometheus-adapter.FlushRules({{0xc000cc8800, 0x1, 0x1}, 0x0, {0xc00000c960, 0x4, 0x4}}, {0x2361640, 0xc0000c6ff0})
    /go/src/github.com/gocrane/crane/pkg/prometheus-adapter/config_fetcher.go:138 +0xb9
github.com/gocrane/crane/pkg/prometheus-adapter.(*PrometheusAdapterConfigFetcher).Reconcile(0xc0007d4000, {0x2339878, 0xc000cdb8c0}, {{{0xc000430510, 0xc}, {0xc000b7c7e0, 0x12}}})
    /go/src/github.com/gocrane/crane/pkg/prometheus-adapter/config_fetcher.go:60 +0x525
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0002246e0, {0x2339878, 0xc000cdb860}, {{{0xc000430510, 0x1e5b0a0}, {0xc000b7c7e0, 0xc0009c9a40}}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114 +0x222
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002246e0, {0x23397d0, 0xc000922000}, {0x1dc7f80, 0xc000d0a600})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311 +0x2f2
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002246e0, {0x23397d0, 0xc000922000})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:223 +0x354

Reproduce steps

  1. Prometheus adapter config
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter
  namespace: crane-system
  labels:
    helm.sh/chart: prometheus-adapter-4.2.0
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: metrics
    app.kubernetes.io/part-of: prometheus-adapter
    app.kubernetes.io/name: prometheus-adapter
    app.kubernetes.io/instance: prometheus-adapter
    app.kubernetes.io/version: "v0.10.0"
data:
  config.yaml: |
    rules:
    - metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[30s])) by (<<.GroupBy>>)
      name:
        as: ${1}_qps
        matches: (.*)_total
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      seriesQuery: '{__name__=~"^http_requests.*_total$",container!="POD",namespace!="",pod!=""}'
    externalRules:
    - metricsQuery: avg(rate(<<.Series>>{<<.LabelMatchers>>}[30s])) by (name)
      resources:
        namespaced: false
      seriesQuery: 'http_requests_total'
  1. Crane start args
- --prometheus-adapter-configmap-namespace=crane-system
- --prometheus-adapter-configmap-name=prometheus-adapter
- --prometheus-adapter-configmap-key=config.yaml
- --prometheus-adapter-extension-labels=region="cn1"

Expected behavior

When only rules and externalRules are configured, metrics rules can be refreshed normally.

Screenshots image

Environment (please complete the following information):

aheizi commented 1 year ago

this issue has already solved by #726