apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.81k stars 6.52k forks source link

k8s service collection error [Feature] #8038

Closed 844700118 closed 2 years ago

844700118 commented 2 years ago

Search before asking

Description

1. The sub-module "cluster" "node" under the k8s module of the dashboard has data, which is normal, but the sub-module "service" has no data displayed. It may be a problem with the OpenTelemetry Collector configuration, but I don't know where the problem is. Ask for help.

2.Server error log [root@k8s-master ~/apache-skywalking-apm-bin-es7]#tail -f logs/skywalking-oap-server.log

......
2021-10-27 19:00:32,988 - io.kubernetes.client.informer.cache.ReflectorRunnable - 79 [controller-reflector-io.kubernetes.client.openapi.models.V1Pod-1] INFO  [] - class io.kubernetes.client.openapi.models.V1Pod#Start listing and watching...
2021-10-27 19:00:32,988 - io.kubernetes.client.informer.cache.ReflectorRunnable - 79 [controller-reflector-io.kubernetes.client.openapi.models.V1Service-1] INFO  [] - class io.kubernetes.client.openapi.models.V1Service#Start listing and watching...
2021-10-27 19:00:33,988 - io.kubernetes.client.informer.cache.ReflectorRunnable - 79 [controller-reflector-io.kubernetes.client.openapi.models.V1Pod-1] INFO  [] - class io.kubernetes.client.openapi.models.V1Pod#Start listing and watching...
2021-10-27 19:00:33,988 - io.kubernetes.client.informer.cache.ReflectorRunnable - 79 [controller-reflector-io.kubernetes.client.openapi.models.V1Service-1] INFO  [] - class io.kubernetes.client.openapi.models.V1Service#Start listing and watching...
2021-10-27 19:00:34,463 - org.apache.skywalking.oap.meter.analyzer.dsl.Expression - 88 [grpcServerPool-1-thread-17] ERROR [] - failed to run "(100 - ((node_memory_SwapFree_bytes * 100) / node_memory_SwapTotal_bytes)).tag({tags -> tags.node_identifier_host_name = 'vm::' + tags.node_identifier_host_name}).service(['node_identifier_host_name'])"
java.lang.IllegalArgumentException: null
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128) ~[guava-28.1-jre.jar:?]
        at org.apache.skywalking.oap.meter.analyzer.dsl.SampleFamily.build(SampleFamily.java:78) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.meter.analyzer.dsl.SampleFamily.newValue(SampleFamily.java:487) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.meter.analyzer.dsl.SampleFamily.div(SampleFamily.java:193) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.meter.analyzer.dsl.SampleFamily$div$9.call(Unknown Source) ~[?:?]
        at Script1.run(Script1.groovy:1) ~[?:?]
        at org.apache.skywalking.oap.meter.analyzer.dsl.Expression.run(Expression.java:77) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.meter.analyzer.Analyzer.analyse(Analyzer.java:115) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.meter.analyzer.MetricConvert.toMeter(MetricConvert.java:73) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.meter.analyzer.prometheus.PrometheusMetricConverter.toMeter(PrometheusMetricConverter.java:84) ~[meter-analyzer-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.server.receiver.otel.oc.OCMetricHandler$1.lambda$onNext$6(OCMetricHandler.java:79) ~[otel-receiver-plugin-8.7.0.jar:8.7.0]
        at java.util.ArrayList.forEach(ArrayList.java:1259) [?:1.8.0_262]
        at org.apache.skywalking.oap.server.receiver.otel.oc.OCMetricHandler$1.onNext(OCMetricHandler.java:79) [otel-receiver-plugin-8.7.0.jar:8.7.0]
        at org.apache.skywalking.oap.server.receiver.otel.oc.OCMetricHandler$1.onNext(OCMetricHandler.java:61) [otel-receiver-plugin-8.7.0.jar:8.7.0]
        at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:249) [grpc-stub-1.32.1.jar:1.32.1]
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309) [grpc-core-1.32.1.jar:1.32.1]
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292) [grpc-core-1.32.1.jar:1.32.1]
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782) [grpc-core-1.32.1.jar:1.32.1]
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) [grpc-core-1.32.1.jar:1.32.1]
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) [grpc-core-1.32.1.jar:1.32.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]

3.k8s indicator monitoring is normal [root@master131 ~]# kubectl logs -f -n kube-system kube-state-metrics-0

I1027 10:01:11.984341       1 main.go:106] Using default resources
I1027 10:01:12.128159       1 main.go:118] Using all namespace
I1027 10:01:12.128166       1 main.go:139] metric allow-denylisting: Excluding the following lists that were on denylist: 
W1027 10:01:12.128948       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1027 10:01:12.212866       1 main.go:241] Testing communication with server
I1027 10:01:12.303482       1 main.go:246] Running with Kubernetes cluster version: v1.20. git version: v1.20.2. git tree state: clean. commit: faecb196815e248d3ecfb03c680a4507229c2a56. platform: linux/amd64
I1027 10:01:12.303518       1 main.go:248] Communication with server successful
I1027 10:01:12.303837       1 main.go:204] Starting metrics server: [::]:8080
I1027 10:01:12.303864       1 metrics_handler.go:102] Autosharding enabled with pod=kube-state-metrics-0 pod_namespace=kube-system
I1027 10:01:12.303886       1 metrics_handler.go:103] Auto detecting sharding settings.
I1027 10:01:12.303881       1 main.go:193] Starting kube-state-metrics self metrics server: [::]:8081
I1027 10:01:12.304116       1 main.go:64] levelinfomsgTLS is disabled.http2false
I1027 10:01:12.304203       1 main.go:64] levelinfomsgTLS is disabled.http2false
I1027 10:01:12.363206       1 builder.go:190] Active resources: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments

4. Data collection OpenTelemetry is normal [root@master131 ~]# kubectl logs -f otel-collector-7bb5b98564-stvdg

2021-10-27T11:34:43.650Z        info    service/collector.go:262        Starting otelcol...     {"Version": "v0.29.0", "NumCPU": 28}
2021-10-27T11:34:43.657Z        info    service/collector.go:322        Using memory ballast    {"MiBs": 683}
2021-10-27T11:34:43.657Z        info    service/collector.go:170        Setting up own telemetry...
2021-10-27T11:34:43.659Z        info    service/telemetry.go:99 Serving Prometheus metrics      {"address": ":8888", "level": 0, "service.instance.id": "9903e31e-d72f-4222-a2a8-32c94a0836db"}
2021-10-27T11:34:43.659Z        info    service/collector.go:205        Loading configuration...
2021-10-27T11:34:43.662Z        info    service/collector.go:221        Applying configuration...
2021-10-27T11:34:43.662Z        info    builder/exporters_builder.go:274        Exporter was built.     {"kind": "exporter", "exporter": "opencensus"}
2021-10-27T11:34:43.662Z        info    builder/exporters_builder.go:274        Exporter was built.     {"kind": "exporter", "exporter": "logging"}
2021-10-27T11:34:43.662Z        info    builder/pipelines_builder.go:204        Pipeline was built.     {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-10-27T11:34:43.662Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "prometheus", "datatype": "metrics"}
2021-10-27T11:34:43.662Z        info    service/service.go:137  Starting extensions...
2021-10-27T11:34:43.662Z        info    builder/extensions_builder.go:53        Extension is starting...        {"kind": "extension", "name": "health_check"}
2021-10-27T11:34:43.662Z        info    healthcheckextension/healthcheckextension.go:41 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"}}}
2021-10-27T11:34:43.662Z        info    builder/extensions_builder.go:59        Extension started.      {"kind": "extension", "name": "health_check"}
2021-10-27T11:34:43.662Z        info    builder/extensions_builder.go:53        Extension is starting...        {"kind": "extension", "name": "zpages"}
2021-10-27T11:34:43.662Z        info    zpagesextension/zpagesextension.go:42   Register Host's zPages  {"kind": "extension", "name": "zpages"}
2021-10-27T11:34:43.662Z        info    zpagesextension/zpagesextension.go:55   Starting zPages extension       {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2021-10-27T11:34:43.662Z        info    builder/extensions_builder.go:59        Extension started.      {"kind": "extension", "name": "zpages"}
2021-10-27T11:34:43.662Z        info    service/service.go:182  Starting exporters...
2021-10-27T11:34:43.662Z        info    builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "opencensus"}
2021-10-27T11:34:43.662Z        info    builder/exporters_builder.go:97 Exporter started.       {"kind": "exporter", "name": "opencensus"}
2021-10-27T11:34:43.662Z        info    builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "logging"}
2021-10-27T11:34:43.662Z        info    builder/exporters_builder.go:97 Exporter started.       {"kind": "exporter", "name": "logging"}
2021-10-27T11:34:43.662Z        info    service/service.go:187  Starting processors...
2021-10-27T11:34:43.662Z        info    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-10-27T11:34:43.662Z        info    builder/pipelines_builder.go:62 Pipeline is started.    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-10-27T11:34:43.662Z        info    service/service.go:192  Starting receivers...
2021-10-27T11:34:43.662Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "prometheus"}
2021-10-27T11:34:43.663Z        info    kubernetes/kubernetes.go:282    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "level": "info", "discovery": "kubernetes"}
2021-10-27T11:34:43.679Z        info    kubernetes/kubernetes.go:282    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "level": "info", "discovery": "kubernetes"}
2021-10-27T11:34:43.680Z        info    discovery/manager.go:195        Starting provider       {"kind": "receiver", "name": "prometheus", "level": "debug", "provider": "static/0", "subs": "[jvm-node-exporter]"}
2021-10-27T11:34:43.680Z        info    discovery/manager.go:195        Starting provider       {"kind": "receiver", "name": "prometheus", "level": "debug", "provider": "kubernetes/1", "subs": "[kubernetes-cadvisor]"}
2021-10-27T11:34:43.680Z        info    discovery/manager.go:195        Starting provider       {"kind": "receiver", "name": "prometheus", "level": "debug", "provider": "kubernetes/2", "subs": "[kube-state-metrics]"}
2021-10-27T11:34:43.680Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "prometheus"}
2021-10-27T11:34:43.680Z        info    discovery/manager.go:213        Discoverer channel closed       {"kind": "receiver", "name": "prometheus", "level": "debug", "provider": "static/0"}
2021-10-27T11:34:43.680Z        info    healthcheck/handler.go:129      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2021-10-27T11:34:43.680Z        info    service/collector.go:182        Everything is ready. Begin running and processing data.
2021-10-27T11:34:50.493Z        INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 170}
2021-10-27T11:34:50.493Z        INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 170}
2021-10-27T11:34:50.708Z        INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 70}
2021-10-27T11:34:51.930Z        INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 46}
2021-10-27T11:34:52.944Z        INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 70}

5. I am not sure if the OpenTelemetry configuration is correct [root@master131 ~]# vi ./otel-collector-config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-conf
  labels:
    app: opentelemetry
    component: otel-collector-conf
  namespace: default
data:
  otel-collector-config: |
    #1. Data export
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: 5s
            evaluation_interval: 5s
          scrape_configs:
            #Collect jvm
            - job_name: 'jvm-node-exporter'
              static_configs:
                - targets: ['192.168.1.131:9110']
            #Collect k8s
            - job_name: 'kubernetes-cadvisor'
              scheme: https
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              kubernetes_sd_configs:
              - role: node
              relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_node_label_(.+)
              - source_labels: []       # relabel the cluster name 
                target_label: cluster
                replacement: k8s-131
              - target_label: __address__
                replacement: kubernetes.default.svc:443
              - source_labels: [__meta_kubernetes_node_name]
                regex: (.+)
                target_label: __metrics_path__
                replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
              - source_labels: [instance]   # relabel the node name 
                separator: ;
                regex: (.+)
                target_label: node
                replacement: $$1
                action: replace
            - job_name: kube-state-metrics
              kubernetes_sd_configs:
              - role: endpoints
              relabel_configs:
              - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
                regex: kube-state-metrics
                replacement: $$1
                action: keep
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - source_labels: []  # relabel the cluster name 
                target_label: cluster
                replacement: k8s-131
    #2.Workflow, preprocessing work done before exporting the data source
    processors:
      batch:
    #Self-health check
    extensions:
      health_check: {}
      zpages: {}
    #3.data import
    exporters:
      opencensus:
        endpoint: "192.168.1.214:11800"
        insecure: true
      logging:
        logLevel: info
    service:
      extensions: [health_check, zpages]
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch]
          exporters: [opencensus,logging]

---

apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  labels:
    app: opentelemetry
    component: otel-collector
  namespace: default
spec:
  type: NodePort
  ports:
  - name: metrics 
    port: 8888
    targetPort: 8888
    nodePort: 58888
  selector:
    component: otel-collector

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  labels:
    app: opentelemetry
    component: otel-collector
  namespace: default
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector
  minReadySeconds: 5
  progressDeadlineSeconds: 120
  replicas: 1 
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-collector
    spec:
      serviceAccountName: prometheus
      containers:
      - command:
          - "/otelcol"
          - "--config=/conf/otel-collector-config.yaml"
          - "--log-level=info"
          - "--mem-ballast-size-mib=683"
        image: otel/opentelemetry-collector:0.29.0
        name: otel-collector
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 400Mi
        ports:
        - containerPort: 55679 # ZPages endpoint
        - containerPort: 55680 # ZPages endpoint
        - containerPort: 4317  # OpenTelemetry receiver
        - containerPort: 8888  # querying metrics
        volumeMounts:
        - name: otel-collector-config-vol
          mountPath: /conf
      volumes:
        - configMap:
            name: otel-collector-conf
            items:
              - key: otel-collector-config
                path: otel-collector-config.yaml
          name: otel-collector-config-vol

Use case

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

wu-sheng commented 2 years ago

Why submit duplicated issue? https://github.com/apache/skywalking/discussions/8034