Yogibaer75 / Check_MK-Things

From check plugins to website extensions
62 stars 17 forks source link

Redfish plugin crash with NVIDIA Server GPU #59

Closed marbaa closed 4 months ago

marbaa commented 4 months ago

Hi,

Redfish plugin works pretty fine, just have to fix somehow timeouts, I'm getting 'stale' statuses.

We have Lenovo servers with and without NVIDIA GPUs. On servers with GPU plugin crashes.

OS: Ubuntu 22 CMK: Checkmk Enterprise Edition 2.3.0p2

Where can I upload crash dump?

image

Yogibaer75 commented 4 months ago

The crash dump will not help so much. I need the agent output from the system section as minimum. If you execute the agent manually the section <<<redfish_processors:sep(0)>>> is relevant for the troubleshooting. This section alone should be not "security" problem if you attach it here.

marbaa commented 4 months ago

There is no agent output. I'm pulling informations from Lenovo Xclarity Controller using Redfish controller. image

Edit: Traceback

  File "/omd/sites/mucpoc/lib/python3/cmk/base/checkers.py", line 716, in get_aggregated_result
    check_result = check_function(**item_kw, **params_kw, **section_kws)
  File "/omd/sites/mucpoc/lib/python3/cmk/base/checkers.py", line 496, in __check_function
    return _aggregate_results(consume_check_results(check_function(*args, **kw)))
  File "/omd/sites/mucpoc/lib/python3/cmk/base/checkers.py", line 554, in consume_check_results
    for subr in subresults:
  File "/omd/sites/mucpoc/lib/python3/cmk/base/api/agent_based/register/check_plugins.py", line 91, in filtered_generator
    for element in generator(*args, **kwargs):
  File "/omd/sites/mucpoc/local/lib/python3/cmk/plugins/redfish/agent_based/redfish_processors.py", line 61, in check_redfish_processors
    dev_state, dev_msg = redfish_health_state(data["Status"])
{'cpu_model': None,
 'cpu_msg': 'Type: GPU, Model: None',
 'data': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
          '@odata.etag': '"4bc2d4b5d99525a8846"',
          '@odata.id': '/redfish/v1/Systems/1/Processors/GPU1',
          '@odata.type': '#Processor.v1_14_0.Processor',
          'Description': 'This resource is used to represent a processor for a '
                         'Redfish implementation.',
          'FirmwareVersion': '95.02.39.00.01',
          'Id': 'GPU1',
          'Links': {'Chassis': {'@odata.id': '/redfish/v1/Chassis/1'},
                    'PCIeDevice': {'@odata.id': '/redfish/v1/Chassis/1/PCIeDevices/slot_3'},
                    'PCIeFunctions': [{'@odata.id': 'Max recursion depth '
                                                    'reached'}],
                    'PCIeFunctions@odata.count': 1},
          'Manufacturer': 'NVIDIA Corporation',
          'Name': 'GPU 1',
          'PartNumber': '26B5-895-A1',
          'ProcessorId': {'VendorId': '0x10de'},
          'ProcessorType': 'GPU',
          'SerialNumber': '1321823009358'},
 'item': 'GPU1',
 'section': {'1': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                   '@odata.etag': '"d63d3f19254834a8c1eb3"',
                   '@odata.id': '/redfish/v1/Systems/1/Processors/1',
                   '@odata.type': '#Processor.v1_14_0.Processor',
                   'Description': 'This resource is used to represent a '
                                  'processor for a Redfish implementation.',
                   'EnvironmentMetrics': {'@odata.id': '/redfish/v1/Systems/1/Processors/1/EnvironmentMetrics'},
                   'Id': '1',
                   'InstructionSet': 'x86-64',
                   'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                      'reached'}},
                   'Location': {'PartLocation': {'LocationOrdinalValue': 'Max '
                                                                         'recursion '
                                                                         'depth '
                                                                         'reached',
                                                 'LocationType': 'Max '
                                                                 'recursion '
                                                                 'depth '
                                                                 'reached',
                                                 'ServiceLabel': 'Max '
                                                                 'recursion '
                                                                 'depth '
                                                                 'reached'}},
                   'Manufacturer': 'Intel(R) Corporation',
                   'MaxSpeedMHz': 3600,
                   'Metrics': {'@odata.id': '/redfish/v1/Systems/1/Processors/1/ProcessorMetrics'},
                   'Model': 'Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz',
                   'Name': 'Processor 1',
                   'Oem': {'Lenovo': {'@odata.type': 'Max recursion depth '
                                                     'reached',
                                      'CacheInfo': 'Max recursion depth '
                                                   'reached',
                                      'CurrentClockSpeedMHz': 'Max recursion '
                                                              'depth reached',
                                      'ExternalBusClockSpeedMHz': 'Max '
                                                                  'recursion '
                                                                  'depth '
                                                                  'reached',
                                      'NumberOfEnabledCores': 'Max recursion '
                                                              'depth reached',
                                      'ProcessorFamily': 'Max recursion depth '
                                                         'reached'}},
                   'PartNumber': '',
                   'ProcessorArchitecture': 'x86',
                   'ProcessorId': {'EffectiveFamily': '0x06',
                                   'EffectiveModel': '0x6a',
                                   'IdentificationRegisters': '0x000606a6bfebfbff',
                                   'MicrocodeInfo': None,
                                   'Step': '0x06',
                                   'VendorId': 'GenuineIntel'},
                   'ProcessorMemory': [{'CapacityMiB': 'Max recursion depth '
                                                       'reached',
                                        'IntegratedMemory': 'Max recursion '
                                                            'depth reached',
                                        'MemoryType': 'Max recursion depth '
                                                      'reached',
                                        'SpeedMHz': 'Max recursion depth '
                                                    'reached'},
                                       {'CapacityMiB': 'Max recursion depth '
                                                       'reached',
                                        'IntegratedMemory': 'Max recursion '
                                                            'depth reached',
                                        'MemoryType': 'Max recursion depth '
                                                      'reached',
                                        'SpeedMHz': 'Max recursion depth '
                                                    'reached'},
                                       {'CapacityMiB': 'Max recursion depth '
                                                       'reached',
                                        'IntegratedMemory': 'Max recursion '
                                                            'depth reached',
                                        'MemoryType': 'Max recursion depth '
                                                      'reached',
                                        'SpeedMHz': 'Max recursion depth '
                                                    'reached'}],
                   'ProcessorType': 'CPU',
                   'SerialNumber': '',
                   'Socket': 'CPU 1',
                   'Status': {'Health': 'OK', 'State': 'Enabled'},
                   'SystemInterface': {'InterfaceType': 'UPI'},
                   'TDPWatts': 265,
                   'TotalCores': 32,
                   'TotalEnabledCores': 32,
                   'TotalThreads': 64,
                   'Version': 'Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz'},
             '2': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                   '@odata.etag': '"d63d40b75c9c34a8c1ebb"',
                   '@odata.id': '/redfish/v1/Systems/1/Processors/2',
                   '@odata.type': '#Processor.v1_14_0.Processor',
                   'Description': 'This resource is used to represent a '
                                  'processor for a Redfish implementation.',
                   'EnvironmentMetrics': {'@odata.id': '/redfish/v1/Systems/1/Processors/2/EnvironmentMetrics'},
                   'Id': '2',
                   'InstructionSet': 'x86-64',
                   'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                      'reached'}},
                   'Location': {'PartLocation': {'LocationOrdinalValue': 'Max '
                                                                         'recursion '
                                                                         'depth '
                                                                         'reached',
                                                 'LocationType': 'Max '
                                                                 'recursion '
                                                                 'depth '
                                                                 'reached',
                                                 'ServiceLabel': 'Max '
                                                                 'recursion '
                                                                 'depth '
                                                                 'reached'}},
                   'Manufacturer': 'Intel(R) Corporation',
                   'MaxSpeedMHz': 3600,
                   'Metrics': {'@odata.id': '/redfish/v1/Systems/1/Processors/2/ProcessorMetrics'},
                   'Model': 'Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz',
                   'Name': 'Processor 2',
                   'Oem': {'Lenovo': {'@odata.type': 'Max recursion depth '
                                                     'reached',
                                      'CacheInfo': 'Max recursion depth '
                                                   'reached',
                                      'CurrentClockSpeedMHz': 'Max recursion '
                                                              'depth reached',
                                      'ExternalBusClockSpeedMHz': 'Max '
                                                                  'recursion '
                                                                  'depth '
                                                                  'reached',
                                      'NumberOfEnabledCores': 'Max recursion '
                                                              'depth reached',
                                      'ProcessorFamily': 'Max recursion depth '
                                                         'reached'}},
                   'PartNumber': '',
                   'ProcessorArchitecture': 'x86',
                   'ProcessorId': {'EffectiveFamily': '0x06',
                                   'EffectiveModel': '0x6a',
                                   'IdentificationRegisters': '0x000606a6bfebfbff',
                                   'MicrocodeInfo': None,
                                   'Step': '0x06',
                                   'VendorId': 'GenuineIntel'},
                   'ProcessorMemory': [{'CapacityMiB': 'Max recursion depth '
                                                       'reached',
                                        'IntegratedMemory': 'Max recursion '
                                                            'depth reached',
                                        'MemoryType': 'Max recursion depth '
                                                      'reached',
                                        'SpeedMHz': 'Max recursion depth '
                                                    'reached'},
                                       {'CapacityMiB': 'Max recursion depth '
                                                       'reached',
                                        'IntegratedMemory': 'Max recursion '
                                                            'depth reached',
                                        'MemoryType': 'Max recursion depth '
                                                      'reached',
                                        'SpeedMHz': 'Max recursion depth '
                                                    'reached'},
                                       {'CapacityMiB': 'Max recursion depth '
                                                       'reached',
                                        'IntegratedMemory': 'Max recursion '
                                                            'depth reached',
                                        'MemoryType': 'Max recursion depth '
                                                      'reached',
                                        'SpeedMHz': 'Max recursion depth '
                                                    'reached'}],
                   'ProcessorType': 'CPU',
                   'SerialNumber': '',
                   'Socket': 'CPU 2',
                   'Status': {'Health': 'OK', 'State': 'Enabled'},
                   'SystemInterface': {'InterfaceType': 'UPI'},
                   'TDPWatts': 265,
                   'TotalCores': 32,
                   'TotalEnabledCores': 32,
                   'TotalThreads': 64,
                   'Version': 'Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz'},
             'GPU1': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bc2d4b5d99525a8846"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU1',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU1',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 1',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1321823009358'},
             'GPU2': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bbe7c1dd24425a8849"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU2',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU2',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 2',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1321823008716'},
             'GPU3': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bc4039bd68f25a884d"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU3',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU3',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 3',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1321823008237'},
             'GPU4': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bcf9f9bbb1d25a8854"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU4',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU4',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 4',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1322823030278'},
             'GPU5': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bc88081dcf125a8859"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU5',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU5',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 5',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1321823008138'},
             'GPU6': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bcd83cfbae325a885c"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU6',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU6',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 6',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1322823030436'},
             'GPU7': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4bc9b603cf7f25a8860"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU7',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU7',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 7',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1321823008610'},
             'GPU8': {'@odata.context': '/redfish/v1/$metadata#Processor.Processor',
                      '@odata.etag': '"4b8a6ad4cabe27e88de"',
                      '@odata.id': '/redfish/v1/Systems/1/Processors/GPU8',
                      '@odata.type': '#Processor.v1_14_0.Processor',
                      'Description': 'This resource is used to represent a '
                                     'processor for a Redfish implementation.',
                      'FirmwareVersion': '95.02.39.00.01',
                      'Id': 'GPU8',
                      'Links': {'Chassis': {'@odata.id': 'Max recursion depth '
                                                         'reached'},
                                'PCIeDevice': {'@odata.id': 'Max recursion '
                                                            'depth reached'},
                                'PCIeFunctions': ['Max recursion depth '
                                                  'reached'],
                                'PCIeFunctions@odata.count': 1},
                      'Manufacturer': 'NVIDIA Corporation',
                      'Name': 'GPU 8',
                      'PartNumber': '26B5-895-A1',
                      'ProcessorId': {'VendorId': '0x10de'},
                      'ProcessorType': 'GPU',
                      'SerialNumber': '1321823008037'}}}
Yogibaer75 commented 4 months ago

There is no agent output. I'm pulling informations from Lenovo Xclarity Controller using Redfish controller.

The special agent you use is an agent. You can download the agent output directly on the GUI.

image

From this output i only need the mentioned section.

marbaa commented 4 months ago

Alright, understand. Here is the output:

<<<redfish_processors:sep(0)>>> {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"d63d3f19254834a8c1eb3\"", "@odata.id": "/redfish/v1/Systems/1/Processors/1", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "EnvironmentMetrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/1/EnvironmentMetrics"}, "Id": "1", "InstructionSet": "x86-64", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}}, "Location": {"PartLocation": {"LocationOrdinalValue": 0, "LocationType": "Socket", "ServiceLabel": "CPU 1"}}, "Manufacturer": "Intel(R) Corporation", "MaxSpeedMHz": 3600, "Metrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/1/ProcessorMetrics"}, "Model": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz", "Name": "Processor 1", "Oem": {"Lenovo": {"@odata.type": "#LenovoProcessor.v1_0_0.LenovoProcessor", "CacheInfo": [{"CacheLevel": "L1", "InstalledSizeKByte": 2560, "MaxCacheSizeKByte": 2560}, {"CacheLevel": "L2", "InstalledSizeKByte": 40960, "MaxCacheSizeKByte": 40960}, {"CacheLevel": "L3", "InstalledSizeKByte": 49152, "MaxCacheSizeKByte": 49152}], "CurrentClockSpeedMHz": 2800, "ExternalBusClockSpeedMHz": 100, "NumberOfEnabledCores": 32, "ProcessorFamily": 179}}, "PartNumber": "", "ProcessorArchitecture": "x86", "ProcessorId": {"EffectiveFamily": "0x06", "EffectiveModel": "0x6a", "IdentificationRegisters": "0x000606a6bfebfbff", "MicrocodeInfo": null, "Step": "0x06", "VendorId": "GenuineIntel"}, "ProcessorMemory": [{"CapacityMiB": 2, "IntegratedMemory": true, "MemoryType": "L1Cache", "SpeedMHz": null}, {"CapacityMiB": 40, "IntegratedMemory": true, "MemoryType": "L2Cache", "SpeedMHz": null}, {"CapacityMiB": 48, "IntegratedMemory": true, "MemoryType": "L3Cache", "SpeedMHz": null}], "ProcessorType": "CPU", "SerialNumber": "", "Socket": "CPU 1", "Status": {"Health": "OK", "State": "Enabled"}, "SystemInterface": {"InterfaceType": "UPI"}, "TDPWatts": 265, "TotalCores": 32, "TotalEnabledCores": 32, "TotalThreads": 64, "Version": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"d63d40b75c9c34a8c1ebb\"", "@odata.id": "/redfish/v1/Systems/1/Processors/2", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "EnvironmentMetrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/2/EnvironmentMetrics"}, "Id": "2", "InstructionSet": "x86-64", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}}, "Location": {"PartLocation": {"LocationOrdinalValue": 1, "LocationType": "Socket", "ServiceLabel": "CPU 2"}}, "Manufacturer": "Intel(R) Corporation", "MaxSpeedMHz": 3600, "Metrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/2/ProcessorMetrics"}, "Model": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz", "Name": "Processor 2", "Oem": {"Lenovo": {"@odata.type": "#LenovoProcessor.v1_0_0.LenovoProcessor", "CacheInfo": [{"CacheLevel": "L1", "InstalledSizeKByte": 2560, "MaxCacheSizeKByte": 2560}, {"CacheLevel": "L2", "InstalledSizeKByte": 40960, "MaxCacheSizeKByte": 40960}, {"CacheLevel": "L3", "InstalledSizeKByte": 49152, "MaxCacheSizeKByte": 49152}], "CurrentClockSpeedMHz": 2800, "ExternalBusClockSpeedMHz": 100, "NumberOfEnabledCores": 32, "ProcessorFamily": 179}}, "PartNumber": "", "ProcessorArchitecture": "x86", "ProcessorId": {"EffectiveFamily": "0x06", "EffectiveModel": "0x6a", "IdentificationRegisters": "0x000606a6bfebfbff", "MicrocodeInfo": null, "Step": "0x06", "VendorId": "GenuineIntel"}, "ProcessorMemory": [{"CapacityMiB": 2, "IntegratedMemory": true, "MemoryType": "L1Cache", "SpeedMHz": null}, {"CapacityMiB": 40, "IntegratedMemory": true, "MemoryType": "L2Cache", "SpeedMHz": null}, {"CapacityMiB": 48, "IntegratedMemory": true, "MemoryType": "L3Cache", "SpeedMHz": null}], "ProcessorType": "CPU", "SerialNumber": "", "Socket": "CPU 2", "Status": {"Health": "OK", "State": "Enabled"}, "SystemInterface": {"InterfaceType": "UPI"}, "TDPWatts": 265, "TotalCores": 32, "TotalEnabledCores": 32, "TotalThreads": 64, "Version": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc2d4b5d99525a8846\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU1", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU1", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_3"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_3/PCIeFunctions/slot_3.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 1", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823009358"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bbe7c1dd24425a8849\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU2", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU2", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_4"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_4/PCIeFunctions/slot_4.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 2", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008716"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc4039bd68f25a884d\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU3", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU3", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_5"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_5/PCIeFunctions/slot_5.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 3", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008237"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bcf9f9bbb1d25a8854\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU4", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU4", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_6"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_6/PCIeFunctions/slot_6.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 4", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1322823030278"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc88081dcf125a8859\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU5", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU5", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_7"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_7/PCIeFunctions/slot_7.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 5", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008138"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bcd83cfbae325a885c\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU6", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU6", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_8"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_8/PCIeFunctions/slot_8.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 6", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1322823030436"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc9b603cf7f25a8860\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU7", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU7", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_9"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_9/PCIeFunctions/slot_9.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 7", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008610"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4b8a6ad4cabe27e88de\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU8", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU8", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_10"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_10/PCIeFunctions/slot_10.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 8", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008037"}

Yogibaer75 commented 4 months ago

fixed with commit 8efbd77 Please test with the new mkp.

PS: the GPUs don't show any information about the status or device type.

marbaa commented 4 months ago

Thanks for fast fix. However not working correctly.

In standard service view, output is crashing: image

In service discovery view, output is fine: image

But as it is not showing any relevant info about GPU, I doubt that it is needed to spend time on it. Or?

marbaa commented 4 months ago

<<<redfish_processors:sep(0)>>> {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"d63d3f19254834a8c1eb3\"", "@odata.id": "/redfish/v1/Systems/1/Processors/1", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "EnvironmentMetrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/1/EnvironmentMetrics"}, "Id": "1", "InstructionSet": "x86-64", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}}, "Location": {"PartLocation": {"LocationOrdinalValue": 0, "LocationType": "Socket", "ServiceLabel": "CPU 1"}}, "Manufacturer": "Intel(R) Corporation", "MaxSpeedMHz": 3600, "Metrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/1/ProcessorMetrics"}, "Model": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz", "Name": "Processor 1", "Oem": {"Lenovo": {"@odata.type": "#LenovoProcessor.v1_0_0.LenovoProcessor", "CacheInfo": [{"CacheLevel": "L1", "InstalledSizeKByte": 2560, "MaxCacheSizeKByte": 2560}, {"CacheLevel": "L2", "InstalledSizeKByte": 40960, "MaxCacheSizeKByte": 40960}, {"CacheLevel": "L3", "InstalledSizeKByte": 49152, "MaxCacheSizeKByte": 49152}], "CurrentClockSpeedMHz": 2800, "ExternalBusClockSpeedMHz": 100, "NumberOfEnabledCores": 32, "ProcessorFamily": 179}}, "PartNumber": "", "ProcessorArchitecture": "x86", "ProcessorId": {"EffectiveFamily": "0x06", "EffectiveModel": "0x6a", "IdentificationRegisters": "0x000606a6bfebfbff", "MicrocodeInfo": null, "Step": "0x06", "VendorId": "GenuineIntel"}, "ProcessorMemory": [{"CapacityMiB": 2, "IntegratedMemory": true, "MemoryType": "L1Cache", "SpeedMHz": null}, {"CapacityMiB": 40, "IntegratedMemory": true, "MemoryType": "L2Cache", "SpeedMHz": null}, {"CapacityMiB": 48, "IntegratedMemory": true, "MemoryType": "L3Cache", "SpeedMHz": null}], "ProcessorType": "CPU", "SerialNumber": "", "Socket": "CPU 1", "Status": {"Health": "OK", "State": "Enabled"}, "SystemInterface": {"InterfaceType": "UPI"}, "TDPWatts": 265, "TotalCores": 32, "TotalEnabledCores": 32, "TotalThreads": 64, "Version": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"d63d40b75c9c34a8c1ebb\"", "@odata.id": "/redfish/v1/Systems/1/Processors/2", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "EnvironmentMetrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/2/EnvironmentMetrics"}, "Id": "2", "InstructionSet": "x86-64", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}}, "Location": {"PartLocation": {"LocationOrdinalValue": 1, "LocationType": "Socket", "ServiceLabel": "CPU 2"}}, "Manufacturer": "Intel(R) Corporation", "MaxSpeedMHz": 3600, "Metrics": {"@odata.id": "/redfish/v1/Systems/1/Processors/2/ProcessorMetrics"}, "Model": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz", "Name": "Processor 2", "Oem": {"Lenovo": {"@odata.type": "#LenovoProcessor.v1_0_0.LenovoProcessor", "CacheInfo": [{"CacheLevel": "L1", "InstalledSizeKByte": 2560, "MaxCacheSizeKByte": 2560}, {"CacheLevel": "L2", "InstalledSizeKByte": 40960, "MaxCacheSizeKByte": 40960}, {"CacheLevel": "L3", "InstalledSizeKByte": 49152, "MaxCacheSizeKByte": 49152}], "CurrentClockSpeedMHz": 2800, "ExternalBusClockSpeedMHz": 100, "NumberOfEnabledCores": 32, "ProcessorFamily": 179}}, "PartNumber": "", "ProcessorArchitecture": "x86", "ProcessorId": {"EffectiveFamily": "0x06", "EffectiveModel": "0x6a", "IdentificationRegisters": "0x000606a6bfebfbff", "MicrocodeInfo": null, "Step": "0x06", "VendorId": "GenuineIntel"}, "ProcessorMemory": [{"CapacityMiB": 2, "IntegratedMemory": true, "MemoryType": "L1Cache", "SpeedMHz": null}, {"CapacityMiB": 40, "IntegratedMemory": true, "MemoryType": "L2Cache", "SpeedMHz": null}, {"CapacityMiB": 48, "IntegratedMemory": true, "MemoryType": "L3Cache", "SpeedMHz": null}], "ProcessorType": "CPU", "SerialNumber": "", "Socket": "CPU 2", "Status": {"Health": "OK", "State": "Enabled"}, "SystemInterface": {"InterfaceType": "UPI"}, "TDPWatts": 265, "TotalCores": 32, "TotalEnabledCores": 32, "TotalThreads": 64, "Version": "Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc2d4b5d99525a8846\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU1", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU1", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_3"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_3/PCIeFunctions/slot_3.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 1", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823009358"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bbe7c1dd24425a8849\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU2", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU2", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_4"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_4/PCIeFunctions/slot_4.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 2", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008716"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc4039bd68f25a884d\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU3", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU3", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_5"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_5/PCIeFunctions/slot_5.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 3", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008237"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bcf9f9bbb1d25a8854\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU4", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU4", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_6"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_6/PCIeFunctions/slot_6.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 4", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1322823030278"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc88081dcf125a8859\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU5", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU5", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_7"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_7/PCIeFunctions/slot_7.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 5", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008138"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bcd83cfbae325a885c\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU6", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU6", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_8"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_8/PCIeFunctions/slot_8.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 6", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1322823030436"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4bc9b603cf7f25a8860\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU7", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU7", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_9"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_9/PCIeFunctions/slot_9.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 7", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008610"} {"@odata.context": "/redfish/v1/$metadata#Processor.Processor", "@odata.etag": "\"4b8a6ad4cabe27e88de\"", "@odata.id": "/redfish/v1/Systems/1/Processors/GPU8", "@odata.type": "#Processor.v1_14_0.Processor", "Description": "This resource is used to represent a processor for a Redfish implementation.", "FirmwareVersion": "95.02.39.00.01", "Id": "GPU8", "Links": {"Chassis": {"@odata.id": "/redfish/v1/Chassis/1"}, "PCIeDevice": {"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_10"}, "PCIeFunctions": [{"@odata.id": "/redfish/v1/Chassis/1/PCIeDevices/slot_10/PCIeFunctions/slot_10.00"}], "PCIeFunctions@odata.count": 1}, "Manufacturer": "NVIDIA Corporation", "Name": "GPU 8", "PartNumber": "26B5-895-A1", "ProcessorId": {"VendorId": "0x10de"}, "ProcessorType": "GPU", "SerialNumber": "1321823008037"}

Yogibaer75 commented 4 months ago

In my setup it is working also the checking.

image
marbaa commented 4 months ago

I let it sit over night and now it is green also on my side. Interesting, yesterday I tried to delete host and all data from checkmk, and create new host.

Thanks again.