blind-oracle / zabbix-sensors

Zabbix template & scripts to discover & monitor Linux sensors
36 stars 15 forks source link

error: 'Cannot obtain sensor information.' #8

Closed FedorPRO closed 1 year ago

FedorPRO commented 1 year ago

Hello! First of all, many thanks for the template and scripts. I haven't found a better fit for my case, but I've run into a problem that I can't solve on my own. I spent a lot of time, but to no avail. The problem is that this set of scripts and the zabbix template normally work only with the CPU temperature, if there are NMVE disks or motherboard sensors in the system agent will swear with errors:

...
2023/08/05 18:42:33.001090 check 'sensor["nvme-pci-0400", "temp1"]' is not supported: Cannot obtain sensor information.
2023/08/05 18:42:34.001322 check 'sensor["nvme-pci-0200", "temp1"]' is not supported: Cannot obtain sensor information.
2023/08/05 18:42:35.001616 check 'sensor["acpitz-acpi-0", "temp1"]' is not supported: Cannot obtain sensor information.
2023/08/05 18:42:36.001255 check 'sensor["acpitz-acpi-0", "temp2"]' is not supported: Cannot obtain sensor information.
2023/08/05 18:42:42.001598 check 'sensor["nvme-pci-0300", "temp1"]' is not supported: Cannot obtain sensor information.
2023/08/05 18:42:43.000917 check 'sensor["nvme-pci-0100", "temp1"]' is not supported: Cannot obtain sensor information.
...

This is an example of a sensors -j command output from a server where there is no collection from NMVE and ACPI interface:

root@pve02:~# sensors -j
{
   "nvme-pci-0400":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 39.850,
         "temp1_max": 82.850,
         "temp1_min": -273.150,
         "temp1_crit": 84.850,
         "temp1_alarm": 0.000
      }
   },
   "nvme-pci-0200":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 39.850,
         "temp1_max": 82.850,
         "temp1_min": -273.150,
         "temp1_crit": 84.850,
         "temp1_alarm": 0.000
      }
   },
   "acpitz-acpi-0":{
      "Adapter": "ACPI interface",
      "temp1":{
         "temp1_input": 27.800,
         "temp1_crit": 100.000
      },
      "temp2":{
         "temp2_input": 29.800,
         "temp2_crit": 100.000
      }
   },
   "coretemp-isa-0000":{
      "Adapter": "ISA adapter",
      "Package id 0":{
         "temp1_input": 49.000,
         "temp1_max": 68.000,
         "temp1_crit": 73.000,
         "temp1_crit_alarm": 0.000
      },
      "Core 0":{
         "temp2_input": 46.000,
         "temp2_max": 68.000,
         "temp2_crit": 73.000,
         "temp2_crit_alarm": 0.000
      },
      "Core 1":{
         "temp3_input": 49.000,
         "temp3_max": 68.000,
         "temp3_crit": 73.000,
         "temp3_crit_alarm": 0.000
      },
      "Core 2":{
         "temp4_input": 49.000,
         "temp4_max": 68.000,
         "temp4_crit": 73.000,
         "temp4_crit_alarm": 0.000
      },
      "Core 3":{
         "temp5_input": 48.000,
         "temp5_max": 68.000,
         "temp5_crit": 73.000,
         "temp5_crit_alarm": 0.000
      }
   },
   "nvme-pci-0300":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 39.850,
         "temp1_max": 82.850,
         "temp1_min": -273.150,
         "temp1_crit": 84.850,
         "temp1_alarm": 0.000
      }
   },
   "nvme-pci-0100":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 39.850,
         "temp1_max": 82.850,
         "temp1_min": -273.150,
         "temp1_crit": 84.850,
         "temp1_alarm": 0.000
      }
   }
}

Can you tell me in which direction to search for the cause of the problem in order to collect information completely on all sensors. I am using Zabbix Server version 6.4 and Zabbix Agent 2 on hosts

blind-oracle commented 1 year ago

Strange, it works for me with NVMe disks. What does the python script from this repo show?

Like:

# /etc/zabbix/scripts/sensors.py
{
  "hwmon0-nvme": {
    "temp1": {
      "alarm": 0,
      "crit": 94850,
      "input": 54850,
      "label": "Composite",
      "max": 89850,
      "min": -60150,
      "sensor_type": "temp"
    }
  },
  "hwmon1-nvme": {
    "temp1": {
      "alarm": 0,
      "crit": 94850,
      "input": 55850,
      "label": "Composite",
      "max": 89850,
      "min": -60150,
      "sensor_type": "temp"
    }
  },
  "hwmon2-pch_lewisburg": {
    "temp1": {
      "input": 67000,
      "sensor_type": "temp"
    }
  },
  "hwmon3-coretemp": {
    "temp1": {
      "crit": 99000,
      "crit_alarm": 0,
      "input": 63000,
      "label": "Package id 0",
      "max": 89000,
      "sensor_type": "temp"
    },
    "temp2": {
      "crit": 99000,
      "crit_alarm": 0,
      "input": 61000,
      "label": "Core 0",
      "max": 89000,
      "sensor_type": "temp"
    },
    "temp3": {
      "crit": 99000,
      "crit_alarm": 0,
      "input": 59000,
      "label": "Core 1",
      "max": 89000,
      "sensor_type": "temp"
    },
    "temp4": {
      "crit": 99000,
      "crit_alarm": 0,
      "input": 63000,
      "label": "Core 3",
      "max": 89000,
      "sensor_type": "temp"
    },
    "temp5": {
      "crit": 99000,
      "crit_alarm": 0,
      "input": 60000,
      "label": "Core 4",
      "max": 89000,
      "sensor_type": "temp"
    }
  }
}
FedorPRO commented 1 year ago

I checked the script sensors.py and it is completely identical to the one in the repository

Script output:

root@pve02:~# /etc/zabbix/scripts/sensors.py 
[
    {
        "{#ADAPTER}": "nvme-pci-0400",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Composite",
        "{#MIN}": -273.15,
        "{#HIGH}": 82.85,
        "{#CRIT}": 84.85,
        "{#TEMP_ID}": "temp1"
    },
    {
        "{#ADAPTER}": "nvme-pci-0200",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Composite",
        "{#MIN}": -273.15,
        "{#HIGH}": 82.85,
        "{#CRIT}": 84.85,
        "{#TEMP_ID}": "temp1"
    },
    {
        "{#ADAPTER}": "acpitz-acpi-0",
        "{#TYPE}": "TEMP",
        "{#NAME}": "temp1",
        "{#MIN}": 0.0,
        "{#HIGH}": 90.0,
        "{#CRIT}": 100.0,
        "{#TEMP_ID}": "temp1"
    },
    {
        "{#ADAPTER}": "acpitz-acpi-0",
        "{#TYPE}": "TEMP",
        "{#NAME}": "temp2",
        "{#MIN}": 0.0,
        "{#HIGH}": 90.0,
        "{#CRIT}": 100.0,
        "{#TEMP_ID}": "temp2"
    },
    {
        "{#ADAPTER}": "coretemp-isa-0000",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Package id 0",
        "{#MIN}": 0.0,
        "{#HIGH}": 68.0,
        "{#CRIT}": 73.0,
        "{#TEMP_ID}": "temp1"
    },
    {
        "{#ADAPTER}": "coretemp-isa-0000",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Core 0",
        "{#MIN}": 0.0,
        "{#HIGH}": 68.0,
        "{#CRIT}": 73.0,
        "{#TEMP_ID}": "temp2"
    },
    {
        "{#ADAPTER}": "coretemp-isa-0000",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Core 1",
        "{#MIN}": 0.0,
        "{#HIGH}": 68.0,
        "{#CRIT}": 73.0,
        "{#TEMP_ID}": "temp3"
    },
    {
        "{#ADAPTER}": "coretemp-isa-0000",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Core 2",
        "{#MIN}": 0.0,
        "{#HIGH}": 68.0,
        "{#CRIT}": 73.0,
        "{#TEMP_ID}": "temp4"
    },
    {
        "{#ADAPTER}": "coretemp-isa-0000",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Core 3",
        "{#MIN}": 0.0,
        "{#HIGH}": 68.0,
        "{#CRIT}": 73.0,
        "{#TEMP_ID}": "temp5"
    },
    {
        "{#ADAPTER}": "nvme-pci-0300",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Composite",
        "{#MIN}": -273.15,
        "{#HIGH}": 82.85,
        "{#CRIT}": 84.85,
        "{#TEMP_ID}": "temp1"
    },
    {
        "{#ADAPTER}": "nvme-pci-0100",
        "{#TYPE}": "TEMP",
        "{#NAME}": "Composite",
        "{#MIN}": -273.15,
        "{#HIGH}": 82.85,
        "{#CRIT}": 84.85,
        "{#TEMP_ID}": "temp1"
    }
]
blind-oracle commented 1 year ago

Hmm, now I wonder if I did some changes to script and didn't commit it... let me check

blind-oracle commented 1 year ago

I've committed the latest version that I had - it's using direct access to sysfs instead of calling sensors. Templates are also re-uploaded to match. Please try. Also sensors.conf was updated so also fetch to match.

FedorPRO commented 1 year ago

Yes, I deployed all the updated files to 3 servers in my home lab - and for all servers, information is now correctly collected (CPU, NVME disks and mb sensors). Thank you very much for your hard work, how can I thank you?

blind-oracle commented 1 year ago

Super. No worries! :)