ccfos / nightingale

An all-in-one observability solution which aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful web UI.
https://flashcat.cloud/docs/
Apache License 2.0
9.42k stars 1.38k forks source link

机器列表界面无法显示指标信息 #1996

Closed ddl1228 closed 2 months ago

ddl1228 commented 2 months ago

Question and Steps to reproduce

机器列表界面,龙蜥操作系统Anolis 8.5.0-18.0.3无法显示指标信息,CentOS 7.9.2009操作系统可以显示,如下图,上面是CentOS操作系统,下面是龙蜥操作系统 image

Relevant logs and configurations

两台机器,除了操作系统不同,其他都一样

Version

夜莺版本:v7.0.0-beta.11 Categraf版本:v0.3.70 操作系统1版本:Anolis 8.5.0-18.0.3 操作系统2版本:CentOS 7.9.2009

UlricQin commented 2 months ago

categraf的配置都一样?没法心跳理论上会有报错日志,可以贴出完整的categraf日志

秦晓辉 @.***

快猫星云 联合创始人 18612185520

------------------ 原始邮件 ------------------ 发件人: ddl1228 @.> 发送时间: 2024年6月14日 16:03 收件人: ccfos/nightingale @.> 抄送: Subscribed @.***> 主题: Re: [ccfos/nightingale] 机器列表界面无法显示指标信息 (Issue #1996)

Question and Steps to reproduce

机器列表界面,龙蜥操作系统Anolis 8.5.0-18.0.3无法显示指标信息,CentOS 7.9.2009操作系统可以显示,如下图,上面是CentOS操作系统,下面是龙蜥操作系统 image.png (view on web)

Relevant logs and configurations 两台机器,除了操作系统不同,其他都一样
Version

夜莺版本:v7.0.0-beta.11 Categraf版本:v0.3.70 操作系统1版本:Anolis 8.5.0-18.0.3 操作系统2版本:CentOS 7.9.2009

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ddl1228 commented 2 months ago

categraf的配置都是一样的,完整的categraf日志如下: Jun 14 16:25:36 hq_sz_n9e_master_01 systemd[1]: Started "Categraf". Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 main.go:149: I! runner.binarydir: /usr/local/categraf Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 main.go:150: I! runner.hostname: hq_sz_n9e_master_01 Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 main.go:151: I! runner.fd_limits: (soft=65536, hard=65536) Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 main.go:152: I! runner.vm_limits: (soft=unlimited, hard=unlimited) Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 provider_manager.go:60: I! use input provider: [local] Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 prometheus_agent.go:19: I! prometheus scraping disabled! Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 ibex_agent.go:19: I! ibex agent disabled! Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 agent.go:38: I! agent starting Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:242: E! input: local.amd_rocm_smi not supported Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:242: E! input: local.arp_packet not supported Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 bind.go:65: DEBUG: 0 Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.conntrack started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.cpu started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:242: E! input: local.dcgm not supported Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.disk started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.diskio started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.ethtool started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.greenplum started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.influxdb started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.ipvs started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:242: E! input: local.jolokia_agent_kafka not supported Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:242: E! input: local.jolokia_agent_misc not supported Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.kernel started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.kernel_vmstat started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.linux_sysctl_fs started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.mem started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.mysql started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.net started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.netstat started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.nfsclient started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.processes started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.self_metrics started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.sockstat started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 metrics_agent.go:317: I! input: local.system started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 agent.go:46: I! [*agent.MetricsAgent] started Jun 14 16:25:36 hq_sz_n9e_master_01 categraf[411567]: 2024/06/14 16:25:36 agent.go:49: I! agent started

另外,虽然在机器列表里面没显示指标信息,但是在即时查询里面是可以查到指标值的 image

UlricQin commented 2 months ago

看起来就是 heartbeat 的 enable 关掉了

ddl1228 commented 2 months ago

再次确认了下,enable的值是true,并且是有心跳的,如下图 image

UlricQin commented 2 months ago

那个更新时间并不能说明heartbeat有效,只要有指标上报,那里就是绿的。这种情况要么就是categraf版本太低,要么就是heartbeat没开。

ddl1228 commented 2 months ago

categraf版本是最新的,heartbeat也是打开的 image

UlricQin commented 2 months ago

你说两台机器的categraf配置都一样,那heartbeat地址都是配置的127.0.0.1?那这个sz的机器上也有n9e的进程?

想不到其他可能性了,摸不到环境没法确认你的输入的信息的准确性。

kongfei605 commented 2 months ago

检查下url的ip和端口正确么?

ddl1228 commented 2 months ago

你说两台机器的categraf配置都一样,那heartbeat地址都是配置的127.0.0.1?那这个sz的机器上也有n9e的进程?

想不到其他可能性了,摸不到环境没法确认你的输入的信息的准确性。

是的,这两台机器上都单独部署了n9e,prometheus,redis和categraf,但是共用同一套mysql,目前可视化界面连的是hq_fs_n9e_master_01这台

UlricQin commented 2 months ago

你没有使用n9e-edge边缘模式,只用了n9e,多个n9e所用的redis要用一套

秦晓辉 @.***

快猫星云 联合创始人 18612185520

------------------ 原始邮件 ------------------ 发件人: ddl1228 @.> 发送时间: 2024年6月17日 09:52 收件人: ccfos/nightingale @.> 抄送: ulricqin @.>, Comment @.> 主题: Re: [ccfos/nightingale] 机器列表界面无法显示指标信息 (Issue #1996)

你说两台机器的categraf配置都一样,那heartbeat地址都是配置的127.0.0.1?那这个sz的机器上也有n9e的进程?

想不到其他可能性了,摸不到环境没法确认你的输入的信息的准确性。

是的,这两台机器上都单独部署了n9e,prometheus,redis和categraf,但是共用同一套mysql,目前可视化界面连的是hq_fs_n9e_master_01这台

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ddl1228 commented 2 months ago

好的,不过之前我们使用V5版本的时候也是这样部署的就没有问题,最新版本是必须使用同一套redis了是吧?