ccfos / nightingale

An all-in-one observability solution which aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful web UI.
https://flashcat.cloud/docs/
Apache License 2.0
9.64k stars 1.41k forks source link

demsg 报硬盘分区错误,但categraf 一直采到但取值一直为0 #1980

Closed chengyc111 closed 2 weeks ago

chengyc111 commented 4 months ago

demsg 报硬盘分区错误,但categraf 一直采到但取值一直为0,版本是V0.45

disk_device_error{busigroup="cas8_110", device="sdb1", fstype="ext4", ident="10.1.5.242", mode="rw", path="/vms"}

[70622188.249188] bash (27137): drop_caches: 3 [70699486.424944] EXT4-fs (sdb1): error count since last fsck: 43 [70699486.424947] EXT4-fs (sdb1): initial error at time 1715886248: ext4_journal_check_start:61 [70699486.424948] EXT4-fs (sdb1): last error at time 1716944767: ext4_mb_release_inode_pa:3871 [70791761.110650] EXT4-fs (sdb1): error count since last fsck: 43 [70791761.110678] EXT4-fs (sdb1): initial error at time 1715886248: ext4_journal_check_start:61 [70791761.110680] EXT4-fs (sdb1): last error at time 1716944767: ext4_mb_release_inode_pa:3871 [70802188.198796] bash (154063): drop_caches: 3 [70884035.800260] EXT4-fs (sdb1): error count since last fsck: 43 [70884035.800264] EXT4-fs (sdb1): initial error at time 1715886248: ext4_journal_check_start:61 [70884035.800266] EXT4-fs (sdb1): last error at time 1716944767: ext4_mb_release_inode_pa:3871 [70976310.481975] EXT4-fs (sdb1): error count since last fsck: 43 [70976310.481998] EXT4-fs (sdb1): initial error at time 1715886248: ext4_journal_check_start:61 [70976310.482000] EXT4-fs (sdb1): last error at time 1716944767: ext4_mb_release_inode_pa:3871 [70978588.215689] bash (65926): drop_caches: 3 [71068585.167582] EXT4-fs (sdb1): error count since last fsck: 43 [71068585.167585] EXT4-fs (sdb1): initial error at time 1715886248: ext4_journal_check_start:61 [71068585.167586] EXT4-fs (sdb1): last error at time 1716944767: ext4_mb_release_inode_pa:3871 [root@b10-01-CVK01 vms]# touch aaa touch: cannot touch ‘aaa’: Read-only file system [root@b10-01-CVK01 vms]# df -hT /dev/sdb1 Filesystem Type Size Used Avail Use% Mounted on /dev/sdb1 ext4 13T 941G 12T 8% /vms [root@b10-01-CVK01 vms]# timed out waiting for input: auto-logout

kongfei605 commented 4 months ago

device error原理就是du 对应的分区,如果有错误,device error就为1 了。

UlricQin commented 2 weeks ago

时间较长的 issue 先关闭了,如果还有问题可以到 categraf 的 github repo 提 issue 哈:github.com/flashcatcloud/categraf