Art3mK / Zabbix-LSI-RAID-Monitoring

40 stars 23 forks source link

RAID info is outdated #8

Closed JayStarrco closed 5 years ago

JayStarrco commented 5 years ago

I have setup and configured the Zabbix template as instructed in https://zabbix.org/wiki/Templates/Intel_LSI_RAID It appears to be running correctly. I have setup crontab to run every 5 minutes for both the raid_trapper_check.pl and raid_discovery.pl. Everything seemed to be working great but Zabbix is now alerting that the RAID info is outdated. This started about 10 minutes after I got it up and running and almost 2 hours later it is still alerting. Everything is working as far as I can see. I can't find a reason for this.

The logs in /tmp are being written every 5 minutes which shows the cronjob is working. When I run it manually it also completes without error. I even went as far as running tcpdumps and I found that it was sending and receiving without error. I am simply puzzled on why the alert is still running.

At 1:10 -rw-r--r-- 1 root root 587 Oct 3 13:10 raid-discovery-zsend-data.tmp -rw-r--r-- 1 root root 963 Oct 3 13:10 raid-discovery-zsend-trapper-data.tmp

Again at 1:45 -rw-r--r-- 1 root root 587 Oct 3 13:45 raid-discovery-zsend-data.tmp -rw-r--r-- 1 root root 963 Oct 3 13:45 raid-discovery-zsend-trapper-data.tmp

cat /tmp/raid-discovery-zsend-trapper-data.tmp

cat /tmp/raid-discovery-zsend-data.tmp

/..*.%..........{"request":"sender data","data":[{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.bbu[0,\"state_of_charge\"]","value":"87"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.bbu[0,\"state_of_charge\"]","value":"87"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.bbu[0,\"state_of_charge\"]","value":"13"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.logical_disk[0,0,\"vd_state\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,0,\"media_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,0,\"predictive_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,0,\"firmware_state\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,1,\"media_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,1,\"predictive_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,1,\"firmware_state\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,2,\"media_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,2,\"predictive_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,2,\"firmware_state\"]","value":"0"},{"host":"kvm1.mgmt.starrco.net","key":"hw.raid.physical_disk[0,32,3,\"media_errors\"]","value":"0"},{"host":"kvm1.mgmt.starrco. 13:40:02.573161 IP 10.10.10.20.59264 > 10.10.10.19.zabbix-trapper: Flags [P.], seq 1453:1941, ack 1, win 229, options [nop,nop,TS val 797952810 ecr 3324354028], length 488

.$../.i.ZBXD........{"response":"success","info":"processed: 19; failed: 0; total: 19; seconds spent: 0.000798"} 13:38:23.017552 IP 10.10.10.19.zabbix-trapper > 10.10.10.20.59254: Flags [F.], seq 106, ack 1942, win 272, options [nop,nop,TS val 3324254478 ecr 797927921], length 0

Art3mK commented 5 years ago

I'll bet you'll get more help on zabbix forum, than here, I didn't used zabbix anymore for more than 3 years :)

But "raid info outdated" alert relies on bbu_state item, which for some reason is missing from trapper data file output you provided

Check if script can find that value from output of your controller: ./raid-check.pl --adapter 0 --mode bbu --item bbu_state

And make sure you're using latest scripts/template from this repo, not from zabbix wiki or forum.

Art3mK commented 5 years ago

particularly, this looks strange:

cat /tmp/raid-discovery-zsend-trapper-data.tmp

    hw.raid.bbu[0,"state_of_charge"] "87"
    hw.raid.bbu[0,"state_of_charge"] "87"
    hw.raid.bbu[0,"state_of_charge"] "13"

why the output contains the same key three times and with different values? Check that you are using latest scripts from this repo

Art3mK commented 5 years ago

@JayStarrco, did you get your issue sorted out?

Art3mK commented 5 years ago

¯_(ツ)_/¯