Closed dan-m-joh closed 3 years ago
Hi Dan,
I am pleased to hear that you like the plugin.
It is absolutely possible to add a CMOS battery check if the data is exposed via Redfish. I would just add it to the --info
check if that's ok for you.
I have a Mockup of a Dell R7515 which includes "System Board CMOS Battery".
Great!
If you add it to --info (no problem for me, as good as anyplace), will it change the "status" (to Warning or Critical) if it is "not present" (but it should be) or if it is not "Nominal"?
Dan
If the attribute is not present then it won't be checked. Also there won't be any warning as well because a lot of Systems don't have this attribute.
Would that be a problem?
No, If the attribute is not present, then there should also be no warning (or output), but if it is present but "not OK" then there should be a warning. Is that possible?
No, If the attribute is not present, then there should also be no warning (or output), but if it is present but "not OK" then there should be a warning. Is that possible?
Yes of course, that's the whole idea 😄 Found this bit.
{
"@odata.context": "/redfish/v1/$metadata#DellSensor.DellSensor",
"@odata.id": "/redfish/v1/Dell/Systems/System.Embedded.1/DellSensor/iDRAC.Embedded.1_0x23_SystemBoardCMOSBattery",
"@odata.type": "#DellSensor.v1_0_0.DellSensor",
"CurrentState": "Good",
"ElementName": "System Board CMOS Battery",
"EnabledState": "Enabled",
"HealthState": "OK",
"Id": "iDRAC.Embedded.1_0x23_SystemBoardCMOSBattery",
"Links": {
"ComputerSystem": {
"@odata.id": "/redfish/v1/Systems/System.Embedded.1"
}
},
"SensorType": "Other"
},
I would just add the status of all sensors and if a HealthState
is != OK it would report the issue.
Yes, that would be great. And maybe CurrentState != Good ?
can you check out next-release and see if this helps you?
also try --detailed
to see all Sensors
I have tested your "next-release" version and found this issue with the info-check when you only have DIMM SLOT A1/B1 installed.
$ ./check_redfish.py -H xxxxxxxxx-idrac -u yyyyyyy -p zzzzzzz --info [CRITICAL]: Sensor "DIMM SLOT A10": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A11": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A12": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A2": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A3": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A4": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A5": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A6": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A7": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A8": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT A9": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B10": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B11": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B12": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B2": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B3": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B4": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B5": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B6": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B7": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B8": Unknown (Enabled/Unknown) [CRITICAL]: Sensor "DIMM SLOT B9": Unknown (Enabled/Unknown) $ ./check_redfish.py -H xxxxxxxxx-idrac -u yyyyyyy -p zzzzzzz --memory [OK]: All memory modules (Total 34GB) are in good condition
Running the info-check from the "master-branch" the info-check returns the correct info (excluding CMOS):
$ ./check_redfish.py -H xxxxxxxxx-idrac -u yyyyyyy -p zzzzzzz --info [OK]: Type: None None (CPU: 2, MEM: 34GB) - BIOS: 2.3.10 - Serial: CNIVC009A50625 - Power: On - Name: xxxxxxxxxx
Looking at
$ ./check_redfish.py -H xxxxxxxxx-idrac -u yyyyyyy -p zzzzzzz --info --detailed
I can see the following row:[OK]: Sensor "PERC1 ROMB Battery": OK (Enabled/Good)
is that the CMOS battery?I can provide you with a Mockup for this DELL model (R740) if you need/want.
Regards, Dan
can you send me a verbose output version of ./check_redfish.py -H xxxxxxxxx-idrac -u yyyyyyy -p zzzzzzz --info --detailed -v
?
I will send you the output in a PM.
Dan
thank you, This seems a bit difficult to distinguish. Is it an actual problem or default bahaviour?
{'@odata.context': '/redfish/v1/$metadata#DellSensor.DellSensor',
'@odata.id': '/redfish/v1/Dell/Systems/System.Embedded.1/DellSensor/iDRAC.Embedded.1_0x23_DIMMSLOTB9',
'@odata.type': '#DellSensor.v1_0_0.DellSensor',
'CurrentState': 'Unknown',
'ElementName': 'DIMM SLOT B9',
'EnabledState': 'Enabled',
'HealthState': 'Unknown',
'Id': 'iDRAC.Embedded.1_0x23_DIMMSLOTB9',
'Links': {'ComputerSystem': {'@odata.id': '/redfish/v1/Systems/System.Embedded.1'}},
'SensorType': 'Other'},
in my opinion the EnabledState should be Disabled
if the component is not installed.
What do you think? Any other suggestion for a solution?
Yes, I agree - if some component is not installed it should be in disabled state (and I can not see something in iDRAC to change it to disabled). My DELL servers are all on iDRAC-9 (4.10.10.10). There is an update to v4.22.0.0 but until I can get that installed will take some time. The only thing I can think of as a workaround is to "ignore" the DIMM slots that are in status "Unknown" (either CurrentState or HealthState (or both of them)). I will in the meantime open a case with DELL and ere what they have to say about it...
So for now should we ignore DIMMS in "Unknown" state?
Or DIMMs in general as we have the --mem
request?
Probably the best thing (for the moment) is to ignore the DIMM in the --info
output.
I have opened a SR with DELL but have yet to get any feedback.
changed in last commit. Can you check if this works for you?
Now the --info
check does not report the "missing" DIMMs.
But I do not see any Sensor information for the CMOS "node"
"@odata.context": "/redfish/v1/$metadata#DellSensor.DellSensor",
"@odata.id": "/redfish/v1/Dell/Systems/System.Embedded.1/DellSensor/iDRAC.Embedded.1_0x23_SystemBoardCMOSBattery",
"@odata.type": "#DellSensor.v1_0_0.DellSensor",
"CurrentState": "Good",
"ElementName": "System Board CMOS Battery",
"EnabledState": "Enabled",
"HealthState": "OK",
"Id": "iDRAC.Embedded.1_0x23_SystemBoardCMOSBattery",
"Links": {
"ComputerSystem": {
"@odata.id": "/redfish/v1/Systems/System.Embedded.1"
}
},
"SensorType": "Other"
},
what does --detailed
report?
$ ./check_redfish.py -H xxxxxxxxxx-idrac -u yyyyyyyyyyyy -p zzzzzzzzzzz --info --detail
[OK]: Type: None None (CPU: 2, MEM: 34GB) - BIOS: 2.3.10 - Serial: XXXXXXX - Power: On - Name: xxxxxxxxxx
[OK]: Sensor "CPU1 FIVR PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 Status": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCCIO PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCORE PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VSA PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 FIVR PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM012 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM012 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM012 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM345 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM345 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM345 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 Status": OK (Enabled/Good)
[OK]: Sensor "CPU2 VCCIO PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 VCORE PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 VSA PG": OK (Enabled/Good)
[OK]: Sensor "PERC1 ROMB Battery": OK (Enabled/Good)
[OK]: Sensor "System Board 1.8V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 2.5V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 3.3V B PG": OK (Enabled/Good)
same host?
and with the previous version you can see the CMOS Battery?
Yes, that is the same host. No in the previous version it was also missing.
But then this is a different host then the output you sent me.
If you use -v
on the host, do you see CMOS in the output?
No, I have been using the same host as an "example host" the whole time. The JSON output in previous comment was not from "-v" output, but a snippet from the Mockup for this host. In the "-v" output there is no mention of CMOS.
$ ./check_redfish.py -H xxxxxxxxxxxx -u yyyyyyyyy -p zzzzzzz --info --detail -v 2>&1 | grep -i cmos
$
And you have a mockup of this server which includes the String CMOS?
Yes, of cause. I will send you it in a PM.
This is very strange.
I have installed check_redfish (git clone --single-branch --branch next-release https://github.com/bb-Ricardo/check_redfish.git
) on a new "virgin" host and I still get a "shortened" version of the --info
check.
I can also not see any trace of "CMOS" using the -v
option even if it is there in the MockUp.
Do you have any idea on how to further debug this?
Hello Ricardo,
I have updated the BIOS on one of "my" DELL servers and now the DIMM's are reported slightly different:
{
"@odata.context": "/redfish/v1/$metadata#DellSlot.DellSlot",
"@odata.id": "/redfish/v1/Dell/Systems/System.Embedded.1/DellSlot/iDRAC.Embedded.1_0x23_DIMMSLOTA1_0x23_Slot",
"@odata.type": "#DellSlot.v1_0_0.DellSlot",
"ConnectorLayout": "DIMM",
"DeviceFQDD": "DIMM.Socket.A1",
"EmptySlot": false,
"Id": "iDRAC.Embedded.1_0x23_DIMMSLOTA1_0x23_Slot",
"Name": "Memory Slot",
"Number": 0,
"NumberDescription": "A1",
"SlotEnabledState": "Enabled",
"Tag": "iDRAC.Embedded.1#DIMMSLOTA1#Slot"
},
{
"@odata.context": "/redfish/v1/$metadata#DellSlot.DellSlot",
"@odata.id": "/redfish/v1/Dell/Systems/System.Embedded.1/DellSlot/iDRAC.Embedded.1_0x23_DIMMSLOTA2_0x23_Slot",
"@odata.type": "#DellSlot.v1_0_0.DellSlot",
"ConnectorLayout": "DIMM",
"DeviceFQDD": null,
"EmptySlot": true,
"Id": "iDRAC.Embedded.1_0x23_DIMMSLOTA2_0x23_Slot",
"Name": "Memory Slot",
"Number": 0,
"NumberDescription": "A2",
"SlotEnabledState": "Enabled",
"Tag": "iDRAC.Embedded.1#DIMMSLOTA2#Slot"
},
As you can see, there is now on extra "field" EmptySlot
that is set to false
for used slots and true
for empty slots.
Maybe you can evaluate this in the --info
check.
-- Dan
Hello Ricardo,
I have updated the BIOS on one of "my" DELL servers and now the DIMM's are reported slightly different:
{ "@odata.context": "/redfish/v1/$metadata#DellSlot.DellSlot", "@odata.id": "/redfish/v1/Dell/Systems/System.Embedded.1/DellSlot/iDRAC.Embedded.1_0x23_DIMMSLOTA1_0x23_Slot", "@odata.type": "#DellSlot.v1_0_0.DellSlot", "ConnectorLayout": "DIMM", "DeviceFQDD": "DIMM.Socket.A1", "EmptySlot": false, "Id": "iDRAC.Embedded.1_0x23_DIMMSLOTA1_0x23_Slot", "Name": "Memory Slot", "Number": 0, "NumberDescription": "A1", "SlotEnabledState": "Enabled", "Tag": "iDRAC.Embedded.1#DIMMSLOTA1#Slot" }, { "@odata.context": "/redfish/v1/$metadata#DellSlot.DellSlot", "@odata.id": "/redfish/v1/Dell/Systems/System.Embedded.1/DellSlot/iDRAC.Embedded.1_0x23_DIMMSLOTA2_0x23_Slot", "@odata.type": "#DellSlot.v1_0_0.DellSlot", "ConnectorLayout": "DIMM", "DeviceFQDD": null, "EmptySlot": true, "Id": "iDRAC.Embedded.1_0x23_DIMMSLOTA2_0x23_Slot", "Name": "Memory Slot", "Number": 0, "NumberDescription": "A2", "SlotEnabledState": "Enabled", "Tag": "iDRAC.Embedded.1#DIMMSLOTA2#Slot" },
As you can see, there is now on extra "field"
EmptySlot
that is set tofalse
for used slots andtrue
for empty slots. Maybe you can evaluate this in the--info
check.-- Dan
Haha, very good. I will ad this additional check to the script
Oups, I just saw that this information was present in the "old" BIOS as well.
This is very strange. I have installed check_redfish (
git clone --single-branch --branch next-release https://github.com/bb-Ricardo/check_redfish.git
) on a new "virgin" host and I still get a "shortened" version of the--info
check. I can also not see any trace of "CMOS" using the-v
option even if it is there in the MockUp.Do you have any idea on how to further debug this?
Can you test something for me please?
can you change line https://github.com/bb-Ricardo/check_redfish/blob/next-release/cr_module/system_chassi.py#L109 to: collection_response = plugin_object.rf.get(dell_sensor_collection.get("@odata.id") + plugin_object.rf.vendor_data.expand_string)
Is the CMOS Battery no present in the output?
Of cause.
But the output is still "chopped":
./check_redfish.py -H xxxxxxxxxxxx -u yyyyyyyy -p zzzzzzz --info --detail
[OK]: Type: None None (CPU: 2, MEM: 32GB) - BIOS: 2.8.2 - Serial: CNIVC009AA0063 - Power: On - Name: xxxxxxxx
[OK]: Sensor "CPU1 FIVR PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 Status": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCCIO PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCORE PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VSA PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 FIVR PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM012 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM012 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM012 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM345 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM345 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 MEM345 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 Status": OK (Enabled/Good)
[OK]: Sensor "CPU2 VCCIO PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 VCORE PG": OK (Enabled/Good)
[OK]: Sensor "CPU2 VSA PG": OK (Enabled/Good)
[OK]: Sensor "PERC1 ROMB Battery": OK (Enabled/Good)
[OK]: Sensor "System Board 1.8V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 2.5V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 3.3V B PG": OK (Enabled/Good)
and the last three lines of -v
:
'Members@odata.count': 63,
'Members@odata.nextLink': '/redfish/v1/Dell/Systems/System.Embedded.1/DellSensorCollection?$skip=50',
'Name': 'DellSensorCollection'}
Of cause.
But the output is still "chopped":
./check_redfish.py -H xxxxxxxxxxxx -u yyyyyyyy -p zzzzzzz --info --detail [OK]: Type: None None (CPU: 2, MEM: 32GB) - BIOS: 2.8.2 - Serial: CNIVC009AA0063 - Power: On - Name: xxxxxxxx [OK]: Sensor "CPU1 FIVR PG": OK (Enabled/Good) [OK]: Sensor "CPU1 MEM012 VDDQ PG": OK (Enabled/Good) [OK]: Sensor "CPU1 MEM012 VPP PG": OK (Enabled/Good) [OK]: Sensor "CPU1 MEM012 VTT PG": OK (Enabled/Good) [OK]: Sensor "CPU1 MEM345 VDDQ PG": OK (Enabled/Good) [OK]: Sensor "CPU1 MEM345 VPP PG": OK (Enabled/Good) [OK]: Sensor "CPU1 MEM345 VTT PG": OK (Enabled/Good) [OK]: Sensor "CPU1 Status": OK (Enabled/Good) [OK]: Sensor "CPU1 VCCIO PG": OK (Enabled/Good) [OK]: Sensor "CPU1 VCORE PG": OK (Enabled/Good) [OK]: Sensor "CPU1 VSA PG": OK (Enabled/Good) [OK]: Sensor "CPU2 FIVR PG": OK (Enabled/Good) [OK]: Sensor "CPU2 MEM012 VDDQ PG": OK (Enabled/Good) [OK]: Sensor "CPU2 MEM012 VPP PG": OK (Enabled/Good) [OK]: Sensor "CPU2 MEM012 VTT PG": OK (Enabled/Good) [OK]: Sensor "CPU2 MEM345 VDDQ PG": OK (Enabled/Good) [OK]: Sensor "CPU2 MEM345 VPP PG": OK (Enabled/Good) [OK]: Sensor "CPU2 MEM345 VTT PG": OK (Enabled/Good) [OK]: Sensor "CPU2 Status": OK (Enabled/Good) [OK]: Sensor "CPU2 VCCIO PG": OK (Enabled/Good) [OK]: Sensor "CPU2 VCORE PG": OK (Enabled/Good) [OK]: Sensor "CPU2 VSA PG": OK (Enabled/Good) [OK]: Sensor "PERC1 ROMB Battery": OK (Enabled/Good) [OK]: Sensor "System Board 1.8V SW PG": OK (Enabled/Good) [OK]: Sensor "System Board 2.5V SW PG": OK (Enabled/Good) [OK]: Sensor "System Board 3.3V B PG": OK (Enabled/Good)
and the last three lines of
-v
:'Members@odata.count': 63, 'Members@odata.nextLink': '/redfish/v1/Dell/Systems/System.Embedded.1/DellSensorCollection?$skip=50', 'Name': 'DellSensorCollection'}
thank you so much, was worth a try. Currently busy with other work. Unfortunately it will take a week or to to implement the actual fix. See #28
No problem. (I know how it is - I also have some other work to do)
so, finally got some time to take care of the issue.
can you check out next-release
and test if it's working now?
Thank you
Sadly, the output still gets "chopped" like before (at the exact same spot).
And the last three lines of -v
still looks the same.
This should now be fixed with current next-release
. I just realized what I did wrong 😞 . Fixed now.
On the topic DIMM slots: My suggestion would be to still simply filter out "DIMM" in general.
Your output from earlier points to /redfish/v1/Dell/Systems/System.Embedded.1/DellSlot/
. As /redfish/v1/Dell/Systems/System.Embedded.1/DellSensor
and DellSlot have no common ID it is difficult to filter if a slot is emtpy or not.
And issues with DIMMs should be properly reported with --mem
.
What do you think?
That is OK with me as --mem
reports the correct information.
Thanks for looking int this!
great, then i will close this issue.
And thinks for all the help with testing.
Btw, it seems there is a similar issue when a server (a R640 in this case) has empty CPU slots
{'@odata.context': '/redfish/v1/$metadata#DellSensor.DellSensor',
'@odata.id': '/redfish/v1/Dell/Systems/System.Embedded.1/DellSensor/iDRAC.Embedded.1_0x23_CPU2Status',
'@odata.type': '#DellSensor.v1_0_0.DellSensor',
'CurrentState': 'Unknown',
'ElementName': 'CPU2 Status',
'EnabledState': 'Enabled',
'HealthState': 'Unknown',
'Id': 'iDRAC.Embedded.1_0x23_CPU2Status',
'Links': {'ComputerSystem': {'@odata.id': '/redfish/v1/Systems/System.Embedded.1'}},
'SensorType': 'Other'},
The fix is probably the same as for memory.
Mmmhhh, thanks for letting me know.
I'll try to fix this, somehow.
Try to add this for Dell server
Added check for empty slots.
First of all, a big thanks for this great check. I am in the process of switching all (where possible) of our ipmi/ilo/idrac checks over to check_redfish.py.
Now I have found one "iDRAC" check that I sadly can not replace with check_redfish.py and that is a check for presence (and correctly working) CMOS-Battery. Reason for this check is that some time ago (before I was working here) one host crashed and they had some data-loss. The RCA showed that the issue had something to do with a bad CMOS Battery and after that the bosses wanted a check for the CMOS "health".
Today we "only" grep for CMOS in the ipmi-output and checks that the status is "Nominal" and that the battery is present. (Output from ipmi: "84,CMOS Battery,Battery,Nominal,N/A,N/A,'battery presence detected'").
I have done MockUp for one of the hosts and I can see that there are some references to "System Board CMOS Battery" in there - I do not know if the information in redfish says if it is OK or not.
So, now for my "enhancement request": Is it possible to add a CMOS-Battery-Check to check_redfish.py (either a separate query (E.g. --cmos) or "integrate" it in one of the existing, E.g. --power)
As I said, I have a MockUp if you need it.
Once again a big Thanks for creating and maintaining check_redfish.py
Regards, Dan