Closed idokaplan closed 6 years ago
Sure. Will check it out.
How would you propose to use it? What values should trigger an alarm? Do you have any output example?
I would like to monitor medium errors because those disks are predicted to be failed.
I would like get an alarm if there are mediumErrors (>0).
For example - DeviceID 32 has 1 medium error. DeviceID 16 has 1 medium error.
c:\Program Files\Adaptec\maxView Storage Manager>arcconf getlogs 1 device tabular Controllers found: 1
Controller log Controller ID.................................... 0 Type............................................. 0 Time............................................. 1486214512 version ........................................ 3 tableFull ...................................... false
driveErrorEntry
smartError ..................................... false
vendorID ....................................... SEAGATE
serialNumber ................................... XXXXX
wwn ............................................ XXXXX
deviceID ....................................... 32
productID ...................................... XXXX
numParityErrors ................................ 0
linkFailures ................................... 0
hwErrors ....................................... 0
abortedCmds .................................... 0
mediumErrors ................................... 1
smartWarning ................................... 0
driveErrorEntry
smartError ..................................... false
vendorID ....................................... SEAGATE
serialNumber ................................... XXXXX
wwn ............................................ XXXXX
deviceID ....................................... 16
productID ...................................... XXXXX
numParityErrors ................................ 0
linkFailures ................................... 0
hwErrors ....................................... 0
abortedCmds .................................... 1
mediumErrors ................................... 1
smartWarning ................................... 0
Thanks! Ido
On Sat, Feb 4, 2017 at 10:30 AM, Yuriy Smetana notifications@github.com wrote:
Sure. Will check it out.
How would you propose to use it? What values shoul trigger an alarm? Do you have any output example?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YSmetana/raid_arcconf_zabbix_lld/issues/2#issuecomment-277428631, or mute the thread https://github.com/notifications/unsubscribe-auth/ATt6NhGR9AKJ4vf9Ht-rcXnTLHaqxH0Fks5rZDcpgaJpZM4L2Xbe .
Sorry. Had no time today. Will check it a bit later. :(
Hi,
Did you have a chance to check it?
Thanks! Ido
Yuriy? :(
Started working on it. Sorry for delay.
Thank you for the follow up. Can I be rude and ask when it will be ready? :)
On Mon, Feb 20, 2017 at 11:45 AM, Yuriy Smetana notifications@github.com wrote:
Started working on it. Sorry for delay.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YSmetana/raid_arcconf_zabbix_lld/issues/2#issuecomment-281032283, or mute the thread https://github.com/notifications/unsubscribe-auth/ATt6Nl6WdzEaJ2fucAAQKU7VnY_IJ4h1ks5reWC6gaJpZM4L2Xbe .
I don't know how to implement it. :) I need your advice thought.
It is a text log. It can consist of many entries. Let's say Zabbix read 5 entries. What should it do with it? Parse each entry to a separate parameters (abortedCmds, mediumErrors, smartWarning)? But they are parameters of some Log Entry not Device. OK, we can assign Log Entry to the corresponding Physical Device. But you can have 2 entries with "mediumErrors=1" and one entry with "mediumErrors=0" for the same device. Does it mean you currently have a problem? Should you clear the log after problem reporting? How to store it in Zabbix?
Do you have any ideas? ;)
Yes, I should clear manually the log after I have replaced the defective disk (not after problem reporting), so it not suppose to have 2 entries for the same device.
If we want to proceed with the same concept that you did, we can do this for example: raid_arcconf_zabbix_lld.py smart -1 lld {"data": [{"{#OBJ_TYPE}": "smart", "{#OBJ_ID}": 1}, {"{#OBJ_TYPE}": "smart","{#OBJ_ID}":4},{"{#OBJ_TYPE}": "smart", "{#OBJ_ID}": 14}, {"{#OBJ_TYPE}": "smart", "{#OBJ_ID}": 0}]}
raid.arcconf[smart,{#OBJ_ID},mediumErrors]
What do you think?
Yuriy? :(
OBJ_ID should be an ID of the OBJ_TYPE. I.e. OBJ_ID 5 is SMART Object #5 not SMART of the Device #5. Newermind. We can deal with it.
But. Every SMART event will create a new Zabbix Item. We could have a hundreds of it. SMART events does not have any IDs only sequential. But order of the events can change easily...
What if we will just create a one Zabbix item (call it Events) that consist of all events (plain text) from all Devices (we can't get a particular device's event) plus certain item-markers like Events-smartError, Events-mediumErrors etc? And if any of the markers has an error value (from any Device) we will trigger a notification? I will try to put a Device vendor/serial in the notification to find the faulty drive easier.
What do you think?
There not suppose to have hundreds of events, will be few and only temporary (until disk replacement and clear).
I'm sorry, but I don't know what is item-markers. Can you please explain?
Yuriy? :(
Hello, I am very sorry to being silent. Unfortunately I have no access to such RAID controllers any more, so I can not test new features. If you have any proposals, please, correct the code and make a pull request. Thank you!
Hi,
Very nice template! Is there any chance to add support to get output of the disks errors? getlogs 1 device tabular
Thanks! Ido