bb-Ricardo / check_redfish

A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system.
MIT License
113 stars 34 forks source link

HP DL360g10: NVMe monitoring when Smart Array controller is present #113

Closed aj-gh closed 8 months ago

aj-gh commented 1 year ago

The current version reads /SmartStorage on HPE servers which usually works fine but seems to be problematic when NVMe are present in a server as these are not "behind" the Smart Array controller, thus not present in /SmartStorage. When patching it to read /Storage instead of /SmartStorage - similar to e4e47ea where this is triggered when no Smart Array is found/no results are returned - all devices are properly detected but it looks like there are some additional health checks done for /SmartStorage such as battery health (which is read from chassis status and not /SmartStorage) and enclosures that are then no longer present. I can make it read both paths but then I have some duplicates. It's also interesting that both NVMe seem count as a single controller; maybe because of the same name?

Maybe it would make sense to read both paths and only process those devices present in /Storage that were not previously handled in /SmartStorage? Or as a very quick workaround a switch to force it to read /Storage even when /SmartStorage returned results.

Thanks for this very handy tool!

Current version:

[OK]: All HP SmartArray controller (1), logical drives (1), physical drives (2), enclosures (2) and batteries (1) are in good condition.
[OK]: HPE Smart Array P408i-a SR Gen10 (FW: 5.00) status is: OK
[OK]: Physical Drive (1I:1:1) 1920GB status: OK
[OK]: Physical Drive (1I:1:2) 1920GB status: OK
[OK]: Logical Drive (0:1) 1920.3GB (RAID 1) status: OK
[OK]: StorageEnclosure (1I:1) status: OK
[OK]: StorageEnclosure (2I:0) status: OK
[OK]: SmartStorageBattery 1 (charge level: 97%, capacity: 12W) status: OK

Patched to read /Storage:

[OK]: All storage controllers (2), volumes (1) and disk drives (4) are in good condition
[OK]: NVMe Storage Controller VO003840KXAVQ Bay 1 (FW: HPK3) status is: OK
[OK]: Physical Drive Secondary Storage Device 1:9 (VO003840KXAVQ / SSD / NVMe) 3840.76GiB status: OK
[OK]: NVMe Storage Controller VO003840KXAVQ Bay 1 (FW: HPK3) status is: OK
[OK]: Physical Drive Secondary Storage Device 1:10 (VO003840KXAVQ / SSD / NVMe) 3840.76GiB status: OK
[OK]: Controller HPE Smart Array P408i-a SR Gen10 status is: OK
[OK]: Physical Drive HPE 1.92TB 22.5G SAS SSD Slot=0:Port=1I:Box=1:Bay=1 (VO001920PXDBR / SSD / SAS) 1920.38GiB status: OK
[OK]: Physical Drive HPE 1.92TB 22.5G SAS SSD Slot=0:Port=1I:Box=1:Bay=2 (VO001920PXDBR / SSD / SAS) 1920.38GiB status: OK
[OK]: Logical Drive SR Volume 1 (SR Volume 1) 1920GiB (RAID1) status: OK

Full inventory with /Storage:

{
    "inventory": {
        "chassi": [],
        "fan": [],
        "firmware": [],
        "logical_drive": [
            {
                "encrypted": false,
                "health_status": "OK",
                "id": "DE07B000:1",
                "name": "SR Volume 1",
                "operation_status": "Enabled",
                "physical_drive_ids": [
                    "DE07B000:0",
                    "DE07B000:1"
                ],
                "raid_type": "RAID1",
                "size_in_byte": 1920349855744,
                "storage_controller_ids": [
                    "DE07B000"
                ],
                "system_ids": [
                    1
                ],
                "type": null
            }
        ],
        "manager": [],
        "memory": [],
        "network_adapter": [],
        "network_port": [],
        "physical_drive": [
            {
                "bay": 9,
                "encrypted": null,
                "failure_predicted": false,
                "firmware": "HPK3",
                "health_status": "OK",
                "id": "DA000000:0:DA000000",
                "interface_speed": null,
                "interface_type": "NVMe",
                "location": "1:9",
                "logical_drive_ids": [],
                "manufacturer": null,
                "model": "VO003840KXAVQ",
                "name": "Secondary Storage Device",
                "operation_status": null,
                "part_number": null,
                "power_on_hours": 7403,
                "predicted_media_life_left_percent": 89,
                "serial": "SNredacted",
                "size_in_byte": 3840755982336,
                "speed_in_rpm": null,
                "storage_controller_ids": [
                    "DA000000:0"
                ],
                "storage_enclosure_ids": [],
                "storage_port": null,
                "system_ids": [
                    1
                ],
                "temperature": null,
                "type": "SSD"
            },
            {
                "bay": 10,
                "encrypted": null,
                "failure_predicted": false,
                "firmware": "HPK3",
                "health_status": "OK",
                "id": "DA000001:0:DA000001",
                "interface_speed": null,
                "interface_type": "NVMe",
                "location": "1:10",
                "logical_drive_ids": [],
                "manufacturer": null,
                "model": "VO003840KXAVQ",
                "name": "Secondary Storage Device",
                "operation_status": null,
                "part_number": null,
                "power_on_hours": 7403,
                "predicted_media_life_left_percent": 89,
                "serial": "SNredacted",
                "size_in_byte": 3840755982336,
                "speed_in_rpm": null,
                "storage_controller_ids": [
                    "DA000001:0"
                ],
                "storage_enclosure_ids": [],
                "storage_port": null,
                "system_ids": [
                    1
                ],
                "temperature": null,
                "type": "SSD"
            },
            {
                "bay": 1,
                "encrypted": null,
                "failure_predicted": false,
                "firmware": "HPD2",
                "health_status": "OK",
                "id": "DE07B000:0",
                "interface_speed": 12000,
                "interface_type": "SAS",
                "location": "Slot=0:Port=1I:Box=1:Bay=1",
                "logical_drive_ids": [
                    "DE07B000:1"
                ],
                "manufacturer": "HPE",
                "model": "VO001920PXDBR",
                "name": "HPE 1.92TB 22.5G SAS SSD",
                "operation_status": "Enabled",
                "part_number": null,
                "power_on_hours": null,
                "predicted_media_life_left_percent": 100.0,
                "serial": "91redacted",
                "size_in_byte": 1920383410176,
                "speed_in_rpm": null,
                "storage_controller_ids": [
                    "DE07B000"
                ],
                "storage_enclosure_ids": [],
                "storage_port": null,
                "system_ids": [
                    1
                ],
                "temperature": null,
                "type": "SSD"
            },
            {
                "bay": 2,
                "encrypted": null,
                "failure_predicted": false,
                "firmware": "HPD2",
                "health_status": "OK",
                "id": "DE07B000:1",
                "interface_speed": 12000,
                "interface_type": "SAS",
                "location": "Slot=0:Port=1I:Box=1:Bay=2",
                "logical_drive_ids": [
                    "DE07B000:1"
                ],
                "manufacturer": "HPE",
                "model": "VO001920PXDBR",
                "name": "HPE 1.92TB 22.5G SAS SSD",
                "operation_status": "Enabled",
                "part_number": null,
                "power_on_hours": null,
                "predicted_media_life_left_percent": 100.0,
                "serial": "91redacted",
                "size_in_byte": 1920383410176,
                "speed_in_rpm": null,
                "storage_controller_ids": [
                    "DE07B000"
                ],
                "storage_enclosure_ids": [],
                "storage_port": null,
                "system_ids": [
                    1
                ],
                "temperature": null,
                "type": "SSD"
            }
        ],
        "power_supply": [],
        "processor": [],
        "storage_controller": [
            {
                "backup_power_health": null,
                "backup_power_present": false,
                "cache_size_in_mb": null,
                "firmware": "HPK3",
                "health_status": "OK",
                "id": "DA000000:0",
                "location": "Bay 1",
                "logical_drive_ids": [],
                "manufacturer": null,
                "model": "VO003840KXAVQ",
                "name": "NVMe Storage Controller",
                "operation_status": "Enabled",
                "physical_drive_ids": [
                    "DA000000:0:DA000000"
                ],
                "serial": "SNredacted",
                "storage_enclosure_ids": [],
                "system_ids": [
                    1
                ]
            },
            {
                "backup_power_health": null,
                "backup_power_present": false,
                "cache_size_in_mb": null,
                "firmware": "HPK3",
                "health_status": "OK",
                "id": "DA000001:0",
                "location": "Bay 1",
                "logical_drive_ids": [],
                "manufacturer": null,
                "model": "VO003840KXAVQ",
                "name": "NVMe Storage Controller",
                "operation_status": "Enabled",
                "physical_drive_ids": [
                    "DA000001:0:DA000001"
                ],
                "serial": "SNredacted",
                "storage_enclosure_ids": [],
                "system_ids": [
                    1
                ]
            },
            {
                "backup_power_health": null,
                "backup_power_present": null,
                "cache_size_in_mb": null,
                "firmware": null,
                "health_status": "OK",
                "id": "DE07B000",
                "location": null,
                "logical_drive_ids": [
                    "DE07B000:1"
                ],
                "manufacturer": null,
                "model": null,
                "name": "HPE Smart Array P408i-a SR Gen10",
                "operation_status": "Enabled",
                "physical_drive_ids": [
                    "DE07B000:0",
                    "DE07B000:1"
                ],
                "serial": null,
                "storage_enclosure_ids": [],
                "system_ids": []
            }
        ],
        "storage_enclosure": [],
        "system": [],
        "temperature": []
    },
    "meta": {
        "data_retrieval_issues": {},
        "duration_of_data_collection_in_seconds": 0.534195,
        "host_that_collected_inventory": "redacted",
        "inventory_id": null,
        "inventory_layout_version": "1.4.0",
        "inventory_name": null,
        "script_version": "1.5.0",
        "start_of_data_collection": "2023-04-21T16:12:51+02:00"
    }
}
bb-Ricardo commented 1 year ago

Hi,

That's interesting. Would you mind creating a mock-up and sending it over? Then I could implement way quicker.

bb-Ricardo commented 1 year ago

Hi @aj-gh,

I would be really interested in adding support for this.

But It's quite impossible without a mockup. Also would need to somehow filter out the duplicate entries.

bb-Ricardo commented 9 months ago

Hi @aj-gh,

I just pushed a change to the next-release branch. Would you be able to check it out and test it?

bb-Ricardo commented 8 months ago

closed issue due to no response