apache / hertzbeat

Apache HertzBeat(incubating) is a real-time monitoring system with agentless, performance cluster, prometheus-compatible, custom monitoring and status page building capabilities.
https://hertzbeat.apache.org/
Apache License 2.0
5.73k stars 994 forks source link

[BUG] monitoring k8s error #2096

Open MrFrankBolt opened 5 months ago

MrFrankBolt commented 5 months ago

Is there an existing issue for this?

Current Behavior

按照文档生成了对应的k8s认证token,然后监控集群,发现一直处于宕机状态,观察后端log发现如下: e3f7d3b3c217f7810ff18c770f2ec4ac

Expected Behavior

希望可以正常监控k8s,采集对应的指标

Steps To Reproduce

  1. 使用的是1.6.0镜像版本,之前是在1.4.4版本,按照文档进行了升级;
  2. 配置只做了数据源相关的更改,其他都是原始默认,k8s采集yaml也是使用的默认;
  3. 使用docker启动;
  4. 查看k8s监控结果,出现上述描述异常;

Environment

HertzBeat version(s):v1.6.0

Debug logs

2024-06-19 17:01:09.893 [339555282159616-kubernetes-nodes-9839] ERROR org.apache.hertzbeat.collector.dispatch.WorkerPool Line:48 - Thread Name 339555282159616-kubernetes-nodes-9839 : Character K is neither a decimal digit number, decimal point, nor "e" notation exponential mark. java.lang.NumberFormatException: Character K is neither a decimal digit number, decimal point, nor "e" notation exponential mark. at java.base/java.math.BigDecimal.(BigDecimal.java:582) at java.base/java.math.BigDecimal.(BigDecimal.java:467) at java.base/java.math.BigDecimal.(BigDecimal.java:896) at org.apache.hertzbeat.collector.dispatch.unit.impl.DataSizeConvert.convert(DataSizeConvert.java:46) at org.apache.hertzbeat.collector.dispatch.MetricsCollect.calculateFields(MetricsCollect.java:310) at org.apache.hertzbeat.collector.dispatch.MetricsCollect.run(MetricsCollect.java:176) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833)

Anything else?

No response

LiuTianyou commented 5 months ago

Hello, I'm very sorry. An error occurred during the unit conversion process of the data collected by k8s monitoring. You can temporarily use this template to monitor k8s,We will fix this issue soon.

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# The monitoring type category:service-application service monitoring db-database monitoring custom-custom monitoring os-operating system monitoring
category: cn
# The monitoring type eg: linux windows tomcat mysql aws...
app: kubernetes
# The monitoring i18n name
name:
  zh-CN: Kubernetes
  en-US: Kubernetes
# The description and help of this monitoring type
help:
  zh-CN: HertzBeat 通过查询 Kubernetes ApiServer api 来对 kubernetes 的通用性能指标(nodes、namespaces、pods、services)进行采集监控。<br><span class='help_module_span'>注意⚠️:为了监控 Kubernetes 中的信息,则需要获取到可访问 Api Server 的授权 TOKEN,让采集请求获取到对应的信息,<a class='help_module_content' href='https://hertzbeat.apache.org/zh-cn/docs/help/kubernetes'>点击查看获取步骤</a>。</span>
  en-US: HertzBeat monitoring Kubernetes general metrics such as nodes, namespaces and pods through querying data from Kubernetes ApiServer api. <br><span class='help_module_span'>Note⚠️:In order to monitor the information of Kubernetes, Hertzbeat need to obtain the authorized TOKEN that can access Api Server. <a class='help_module_content' href='https://hertzbeat.apache.org/docs/help/kubernetes'>Click here to view the specific steps.</a></span>
  zh-TW: HertzBeat 通過查詢 Kubernetes ApiServer api 來對 kubernetes 的通用性能指標(nodes、namespaces、pods、services)進行采集監控。<br><span class='help_module_span'>注意⚠️:爲了監控 Kubernetes 中的信息,則需要獲取到可訪問 Api Server 的授權 TOKEN,讓采集請求獲取到對應的信息,<a class='help_module_content' href='https://hertzbeat.apache.org/zh-cn/docs/help/kubernetes'>點擊查看獲取步驟</a>。</span>
helpLink:
  zh-CN: https://hertzbeat.apache.org/zh-cn/docs/help/kubernetes
  en-US: https://hertzbeat.apache.org/docs/help/kubernetes
# Input params define for monitoring(render web ui by the definition)
params:
  # field-param field key
  - field: host
    # name-param field display i18n name
    name:
      zh-CN: 目标Host
      en-US: Target Host
    # type-param field type(most mapping the html input type)
    type: host
    # required-true or false
    required: true
  # field-param field key
  - field: port
    # name-param field display i18n name
    name:
      zh-CN: ApiServer端口
      en-US: ApiServer Port
    # type-param field type(most mapping the html input type)
    type: number
    # when type is number, range is required
    range: '[0,65535]'
    # required-true or false
    required: true
    # default value
    defaultValue: 6443
  # field-param field key
  - field: authType
    # name-param field display i18n name
    name:
      zh-CN: 认证方式
      en-US: Auth Type
    # type-param field type(radio mapping the html radio tag)
    type: radio
    # required-true or false
    required: true
    # when type is radio checkbox, use option to show optional values {name1:value1,name2:value2}
    options:
      - label: Bearer Token
        value: Bearer Token
    defaultValue: Bearer Token
  - field: token
    name:
      zh-CN: 认证Token
      en-US: Access Token
    type: text
    required: true
# collect metrics config list
metrics:
  # metrics - nodes
  - name: nodes
    # metrics scheduling priority(0->127)->(high->low), metrics with the same priority will be scheduled in parallel
    # priority 0's metrics is availability metrics, it will be scheduled first, only availability metrics collect success will the scheduling continue
    priority: 0
    # collect metrics content
    fields:
      # field-metric name, type-metric type(0-number,1-string), unit-metric unit('%','ms','MB'), label-whether it is a metrics label field
      - field: node_name
        type: 1
        i18n:
          zh-CN: 节点名称
          en-US: Node Name
      - field: is_ready
        type: 1
        i18n:
          zh-CN: 节点就绪状态
          en-US: Node Ready Status
      - field: capacity_cpu
        type: 0
        i18n:
          zh-CN: CPU 容量
          en-US: CPU Capacity
      - field: allocatable_cpu
        type: 0
        i18n:
          zh-CN: 可分配 CPU
          en-US: Allocatable CPU
      - field: capacity_memory
        type: 1
        i18n:
          zh-CN: 内存容量
          en-US: Memory Capacity
      - field: allocatable_memory
        type: 1
        i18n:
          zh-CN: 可分配内存
          en-US: Allocatable Memory
      - field: creation_time
        type: 1
        i18n:
          zh-CN: 创建时间
          en-US: Creation Time
    # (optional)metrics field alias name, it is used as an alias field to map and convert the collected data and metrics field
    aliasFields:
      - $.metadata.name
      - $.status.conditions[?(@.type=='Ready')].status
      - $.status.capacity.cpu
      - $.status.capacity.memory
      - $.status.allocatable.cpu
      - $.status.allocatable.memory
      - $.metadata.creationTimestamp
    # (optional)mapping and conversion expressions, use these and aliasField above to calculate metrics value
    # eg: cores=core1+core2, usage=usage, waitTime=allTime-runningTime
    calculates:
      - node_name=$.metadata.name
      - is_ready=$.status.conditions[?(@.type=='Ready')].status
      - capacity_cpu=$.status.capacity.cpu
      - allocatable_cpu=$.status.allocatable.cpu
      - capacity_memory=$.status.capacity.memory
      - allocatable_memory=$.status.allocatable.memory
      - creation_time=$.metadata.creationTimestamp
    # (optional)field unit mapping and conversion expressions, origin unit -> final unit

    protocol: http
    http:
      host: ^_^host^_^
      port: ^_^port^_^
      url: /api/v1/nodes
      method: GET
      ssl: true
      authorization:
        type: ^_^authType^_^
        bearerTokenToken: ^_^token^_^
      parseType: jsonPath
      parseScript: '$.items.*'

  - name: namespaces
    priority: 1
    fields:
      - field: namespace
        type: 1
        i18n:
          zh-CN: 命名空间
          en-US: Namespace
      - field: status
        type: 1
        i18n:
          zh-CN: 状态
          en-US: Status
      - field: creation_time
        type: 1
        i18n:
          zh-CN: 创建时间
          en-US: Creation Time
    aliasFields:
      - $.metadata.name
      - $.status.phase
      - $.metadata.creationTimestamp
    calculates:
      - namespace=$.metadata.name
      - status=$.status.phase
      - creation_time=$.metadata.creationTimestamp
    protocol: http
    http:
      host: ^_^host^_^
      port: ^_^port^_^
      url: /api/v1/namespaces
      method: GET
      ssl: true
      authorization:
        type: ^_^authType^_^
        bearerTokenToken: ^_^token^_^
      parseType: jsonPath
      parseScript: '$.items.*'

  - name: pods
    priority: 1
    fields:
      - field: pod
        type: 1
        i18n:
          zh-CN: Pod名称
          en-US: Pod Name
      - field: namespace
        type: 1
        i18n:
          zh-CN: 命名空间
          en-US: Namespace
      - field: status
        type: 1
        i18n:
          zh-CN: 状态
          en-US: Status
      - field: restart
        type: 1
        i18n:
          zh-CN: 重启次数
          en-US: Restart Count
      - field: host_ip
        type: 1
        i18n:
          zh-CN: 主机IP
          en-US: Host IP
      - field: pod_ip
        type: 1
        i18n:
          zh-CN: Pod IP
          en-US: Pod IP
      - field: creation_time
        type: 1
        i18n:
          zh-CN: 创建时间
          en-US: Creation Time
      - field: start_time
        type: 1
        i18n:
          zh-CN: 启动时间
          en-US: Start Time
    aliasFields:
      - $.metadata.name
      - $.metadata.namespace
      - $.status.phase
      - $.spec.restartPolicy
      - $.status.hostIP
      - $.status.podIP
      - $.metadata.creationTimestamp
      - $.status.startTime
    calculates:
      - pod=$.metadata.name
      - namespace=$.metadata.namespace
      - status=$.status.phase
      - restart=$.spec.restartPolicy
      - host_ip=$.status.hostIP
      - pod_ip=$.status.podIP
      - creation_time=$.metadata.creationTimestamp
      - start_time=$.status.startTime
    protocol: http
    http:
      host: ^_^host^_^
      port: ^_^port^_^
      url: /api/v1/pods
      method: GET
      ssl: true
      authorization:
        type: ^_^authType^_^
        bearerTokenToken: ^_^token^_^
      parseType: jsonPath
      parseScript: '$.items.*'

  - name: services
    priority: 1
    fields:
      - field: service
        type: 1
        i18n:
          zh-CN: 服务
          en-US: Service
      - field: namespace
        type: 1
        i18n:
          zh-CN: 命名空间
          en-US: Namespace
      - field: type
        type: 1
        i18n:
          zh-CN: 类型
          en-US: Type
      - field: cluster_ip
        type: 1
        i18n:
          zh-CN: 集群IP
          en-US: Cluster IP
      - field: selector
        type: 1
        i18n:
          zh-CN: 选择器
          en-US: Selector
      - field: creation_time
        type: 1
        i18n:
          zh-CN: 创建时间
          en-US: Creation Time
    aliasFields:
      - $.metadata.name
      - $.metadata.namespace
      - $.spec.type
      - $.spec.clusterIP
      - $.spec.selector
      - $.metadata.creationTimestamp
    calculates:
      - service=$.metadata.name
      - namespace=$.metadata.namespace
      - type=$.spec.type
      - cluster_ip=$.spec.clusterIP
      - selector=$.spec.selector
      - creation_time=$.metadata.creationTimestamp
    protocol: http
    http:
      host: ^_^host^_^
      port: ^_^port^_^
      url: /api/v1/services
      method: GET
      ssl: true
      authorization:
        type: ^_^authType^_^
        bearerTokenToken: ^_^token^_^
      parseType: jsonPath
      parseScript: '$.items.*'
ncuhe commented 2 months ago
1725165814751
# 修改下面的两个type为1
  - field: capacity_memory
        type: 1
        i18n:
          zh-CN: 内存容量
          en-US: Memory Capacity
      - field: allocatable_memory
        type: 1
        i18n:
          zh-CN: 可分配内存
          en-US: Allocatable Memory
# 将units节点注释掉

 # units:
    #   - capacity_memory=Ki->Mi
    #   - allocatable_memory=Ki->Mi