Linuxfabrik / monitoring-plugins

220+ check plugins for Icinga and other Nagios-compatible monitoring applications. Each plugin is a standalone command line tool (written in Python) that provides a specific type of check.
https://linuxfabrik.ch
The Unlicense
214 stars 49 forks source link

redfish-drives state #652

Open d3berry opened 1 year ago

d3berry commented 1 year ago

Hi, trying to use redfish-drives to monitor disks from Nagios. The disks are OK, but the system state is WARNING (inlet temp over threshold), so the plugin returns a warning and generates an email.

"Checked storage on 1 member. There are warnings."

Is there a way to only check the disks themselves, i.e. ignore any unrelated errors?

d3berry commented 1 year ago

Checked storage on 1 member. There are warnings.

Member: Dell Inc. PowerEdge R640, Processors: 2x Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz (64 logical), BIOS: 2.8.2, Power: On, LED: Lit, SKU: 9DJR673, SerNo: CNIVC0009E0792 [WARNING]

Disk ! Type ! Proto ! Manufacturer ! Model ! SerialNumber ! Size ! LifeLeft % ! State
----------------------------+------+-------+--------------+--------------------------------+--------------------+----------+------------+-------
PCIe SSD in Slot 9 in Bay 1 ! SSD ! PCIe ! SAMSUNG ! Dell Ent NVMe AGN MU U.2 1.6TB ! S61ENE0N800405 ! 1.5TiB ! 100 ! [OK]
PCIe SSD in Slot 8 in Bay 1 ! SSD ! PCIe ! SAMSUNG ! Dell Ent NVMe AGN MU U.2 1.6TB ! S61ENE0N800841 ! 1.5TiB ! 95 ! [OK]
PCIe SSD in Slot 4 in Bay 1 ! SSD ! PCIe ! INTEL ! INTEL SSDPE2KX040T8 ! PHLJ114503204P0DGN ! 3.6TiB ! 99 ! [OK]
PCIe SSD in Slot 5 in Bay 1 ! SSD ! PCIe ! INTEL ! INTEL SSDPE2KX040T8 ! PHLJ114501L74P0DGN ! 3.6TiB ! 99 ! [OK]
PCIe SSD in Slot 6 in Bay 1 ! SSD ! PCIe ! INTEL ! INTEL SSDPE2KX040T8 ! PHLJ114502RR4P0DGN ! 3.6TiB ! 99 ! [OK]
PCIe SSD in Slot 7 in Bay 1 ! SSD ! PCIe ! INTEL ! INTEL SSDPE2KX040T8 ! PHLJ1144011Y4P0DGN ! 3.6TiB ! 99 ! [OK]
SSD 0 ! SSD ! SATA ! MICRON ! MTFDDAV480TDS ! 202629662A88 ! 447.1GiB ! 100 ! [OK]
SSD 1 ! SSD ! SATA ! MICRON ! MTFDDAV480TDS ! 202629662A78 ! 447.1GiB ! 100 ! [OK]

ID ! Name ! Description ! Drives ! State
--------------------+---------------------------------------------------------+---------------------------------------------------------+--------+-------
RAID.Integrated.1-1 ! PERC H740P Mini ! PERC H740P Mini ! 0 ! [OK]
CPU.1 ! CPU.1 ! CPU.1 ! 6 ! [OK]
AHCI.Embedded.1-1 ! C620 Series Chipset Family SSATA Controller [AHCI mode] ! C620 Series Chipset Family SSATA Controller [AHCI mode] ! 0 ! [OK]
AHCI.Embedded.2-1 ! C620 Series Chipset Family SATA Controller [AHCI mode] ! C620 Series Chipset Family SATA Controller [AHCI mode] ! 0 ! [OK]
AHCI.Slot.1-1 ! BOSS-S1 ! BOSS-S1 ! 2 ! [OK]
markuslf commented 1 year ago

Hm, makes sense... would be better to split this up:

d3berry commented 1 year ago

Yes, currently it uses the worst of systems_state, drive_data_state, storage_data_state:

$ less redfish-drives3

    state = STATE_OK

        systems_state = lib.redfish3.get_state(systems)
        state = lib.base3.get_worst(state, systems_state)

                drive_data_state = lib.redfish3.get_state(drive_data)
                state = lib.base3.get_worst(state, drive_data_state)

            storage_data_state = lib.redfish3.get_state(storage_data)
            state = lib.base3.get_worst(state, storage_data_state)

   elif state == STATE_WARN:
        msg = 'Checked storage on {} {}. There are warnings.\n\n'.format(

Perhaps remove the systems_state check:

$ cd git/monitoring-plugins/check-plugins/redfish-drives
$ cp redfish-drives3 redfish-drives3db
$ vi redfish-drives3db
        #state = lib.base3.get_worst(state, systems_state)
$ ./redfish-drives3db --username xxx --password xxx --url xxx
Everything is ok, checked storage on 1 member.
...
markuslf commented 1 year ago

Redfish Mockup Server: https://www.redhat.com/sysadmin/redfish-manage-servers-automatically