bb-Ricardo / check_redfish

A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system.
MIT License
113 stars 34 forks source link

Object id 'x' for 'Fan' already used on Dell R6515 #18

Closed matejzero closed 4 years ago

matejzero commented 4 years ago

Hello,

I'm testing your check on a new Dell R6515 and when running the script with fan checks, I get:

Object id '0' for 'Fan' already used
Object id '1' for 'Fan' already used
Object id '2' for 'Fan' already used
Object id '3' for 'Fan' already used
Object id '4' for 'Fan' already used
Object id '5' for 'Fan' already used
[OK]: All fans (12) are in good condition and fan redundancy status is: Enabled|'Fan_System_Board_Fan1A'=6720;; 'Fan_System_Board_Fan1B'=7200;; 'Fan_System_Board_Fan2A'=6720;; 'Fan_System_Board_Fan2B'=7200;; 'Fan_System_Board_Fan3A'=6840;; 'Fan_System_Board_Fan3B'=7200;; 'Fan_System_Board_Fan4A'=6720;; 'Fan_System_Board_Fan4B'=7200;; 'Fan_System_Board_Fan5A'=6720;; 'Fan_System_Board_Fan5B'=7320;; 'Fan_System_Board_Fan6A'=6720;; 'Fan_System_Board_Fan6B'=7320;;

I think the problem is because 1A and 1B fans both have the same MemberId value in the output.

{
...
    "Fans": [{
            "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Thermal#/Fans/0",
            "@odata.type": "#Thermal.v1_5_0.Fan",
            "Assembly": {
                "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Assembly"
            },
            "FanName": "System Board Fan1A",
            "LowerThresholdCritical": 600,
            "LowerThresholdFatal": 600,
            "LowerThresholdNonCritical": 960,
            "MaxReadingRange": "None",
            "MemberId": "0",
            "MinReadingRange": 600,
            "Name": "System Board Fan1A",
            "PhysicalContext": "SystemBoard",
            "Reading": 6720,
            "ReadingUnits": "RPM",
            "Redundancy": [],
            "Redundancy@odata.count": 0,
            "RelatedItem": [{
                "@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
            }],
            "RelatedItem@odata.count": 1,
            "SensorNumber": 56,
            "Status": {
                "Health": "OK",
                "State": "Enabled"
            },
            "UpperThresholdCritical": "None",
            "UpperThresholdFatal": "None",
            "UpperThresholdNonCritical": "None"
        },
        {
            "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Thermal#/Fans/0",
            "@odata.type": "#Thermal.v1_5_0.Fan",
            "Assembly": {
                "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Assembly"
            },
            "FanName": "System Board Fan1B",
            "LowerThresholdCritical": 600,
            "LowerThresholdFatal": 600,
            "LowerThresholdNonCritical": 960,
            "MaxReadingRange": "None",
            "MemberId": "0",
            "MinReadingRange": 600,
            "Name": "System Board Fan1B",
            "PhysicalContext": "SystemBoard",
            "Reading": 7200,
            "ReadingUnits": "RPM",
            "Redundancy": [],
            "Redundancy@odata.count": 0,
            "RelatedItem": [{
                "@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
            }],
            "RelatedItem@odata.count": 1,
            "SensorNumber": 57,
            "Status": {
                "Health": "OK",
                "State": "Enabled"
            },
            "UpperThresholdCritical": "None",
            "UpperThresholdFatal": "None",
            "UpperThresholdNonCritical": "None"
        },
        {
            "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Thermal#/Fans/1",
            "@odata.type": "#Thermal.v1_5_0.Fan",
            "Assembly": {
                "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Assembly"
            },
            "FanName": "System Board Fan2A",
            "LowerThresholdCritical": 600,
            "LowerThresholdFatal": 600,
            "LowerThresholdNonCritical": 960,
            "MaxReadingRange": "None",
            "MemberId": "1",
            "MinReadingRange": 600,
            "Name": "System Board Fan2A",
            "PhysicalContext": "SystemBoard",
            "Reading": 6720,
            "ReadingUnits": "RPM",
            "Redundancy": [],
            "Redundancy@odata.count": 0,
            "RelatedItem": [{
                "@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
            }],
            "RelatedItem@odata.count": 1,
            "SensorNumber": 58,
            "Status": {
                "Health": "OK",
                "State": "Enabled"
            },
            "UpperThresholdCritical": "None",
            "UpperThresholdFatal": "None",
            "UpperThresholdNonCritical": "None"
        }
    ]
...
}

Differences between A and B are FanName, Name and SensorNumber values. I'm running latest iDRAC (4.10.10.10). No such problem on Lenovo servers, as each fan has it's own MemberId.

bb-Ricardo commented 4 years ago

Thank you very much for the bug report. Will have a look at it on Monday.

But it still seems a strange concept to me to use the same identifier for two different fans.

matejzero commented 4 years ago

I agree! I hope I find some time on Monday to check Redfish specs and maybe open a case with Dell regarding that. If I understand correctly, you successfully monitor multiple Dell servers, so this might be a specific model bug.

bb-Ricardo commented 4 years ago

I don’t have all the models on hand to test the plugin and rely on mock ups.

I got a mock up of a Dell 7515 with the same iDRAC version and I haven’t seen this issue there. Will also double check it again.

Would also be happy about mock ups of systems which are not in the list of supported servers.

This is usually what I use to get them: https://github.com/DMTF/Redfish-Mockup-Creator

matejzero commented 4 years ago

OK.

If you wish, I can get you a mockup for SR630, SR650, Dell 6515, 640, 740 and a few old HPE (gen9, gen8). I would prefer the support for Lenovo and Dell, since this is what we mostly use:)

matejzero commented 4 years ago

As far as Lenovo servers go, I see you test with firmware 2.12, where the latest is 3.60 (I'm running it in production for about 3 months I think).

Noticable changes:

3.60: 
  - Added the support of Redfish 1.8.0 and new properties support.
  - Added the support to report Raid health

3.00:
  - Added the Redfish support of telemetry service with metric reports and SSE.
  - Added the Redfish support of 2019.1 schema and registries.
  - Added the Redfish support of firmware update with push method and enhanced the firmware update messages.
  - Added the Redfish support to get the PSU firmware inventory.
  - Added the Redfish support of IO adapter settings with Bios schema.
  - Added the Redfish support of Enclosure "Chassis" object on blade and dense systems.

I don't have time to lookup Redfish schema changes for 2019.1 and 1.8.0 release, but I think that was where the URLs for fans changed.

bb-Ricardo commented 4 years ago

I would highly appreciate the two Lenovo ones and the Dell 6515 if you don’t mind. Just send them to my email address.

I also would like to start a library of mockups to be able to test against it.

What do you think about it? Can you share your thoughts on issue #5 ?

matejzero commented 4 years ago

I'll generate them later today or tomorrow and send them your way.

I like the idea of having tests for various vendors and BMC versions, so you can easily check new releases for compatibility.

matejzero commented 4 years ago

I dumped the Redfish mockups, but it's going to take some time to go through them and clean them up.

bb-Ricardo commented 4 years ago

can you checkout "next-release" and see if the error persist?

matejzero commented 4 years ago

It's all good now!

# ./check_redfish.py  -H ...... --fan
[OK]: All fans (12) are in good condition and fan redundancy status is: Enabled|'Fan_System_Board_Fan1A'=9480;; 'Fan_System_Board_Fan1B'=10200;; 'Fan_System_Board_Fan2A'=9480;; 'Fan_System_Board_Fan2B'=10200;; 'Fan_System_Board_Fan3A'=9600;; 'Fan_System_Board_Fan3B'=10320;; 'Fan_System_Board_Fan4A'=9600;; 'Fan_System_Board_Fan4B'=10320;; 'Fan_System_Board_Fan5A'=9600;; 'Fan_System_Board_Fan5B'=10320;; 'Fan_System_Board_Fan6A'=9600;; 'Fan_System_Board_Fan6B'=10320;;