centreon / centreon-plugins

Collection of standard plugins to discover and gather cloud-to-edge metrics and status across your whole IT infrastructure.
https://www.centreon.com
Apache License 2.0
311 stars 274 forks source link

snmp mode collection: one table instance based on another table instance ? #4348

Closed Luth1ng closed 1 year ago

Luth1ng commented 1 year ago

Description

I will try to be clear...

I have one OID table with Board entries :

#      .1.3.6.1.4.1.637.61.1.23.3.1.x.y
#        x = OID node
#        y = board ID

For example, if I request the OID 1.3.6.1.4.1.637.61.1.23.3.1.4.1001, I get value "1" which can be translated to : eqptSlotPowerStatus for board ID 1001 is powerUp(1)

This part is working great with my json collection file.

Now, I would like to add the possibility to check the sensors temperature for each board. There is another OID table with Thermal Sensor entries :

#      1.3.6.1.4.1.637.61.1.23.10.1.x.y.z
#        x = OID node
#        y = board ID
#        z = sensor ID

Typically when I do a snmpwalk I get :

.1.3.6.1.4.1.637.61.1.23.10.1.2.1001.1 = INTEGER: 38
.1.3.6.1.4.1.637.61.1.23.10.1.2.1001.2 = INTEGER: 36
.1.3.6.1.4.1.637.61.1.23.10.1.2.1001.3 = INTEGER: 34

What I tried

I created a second snmp table with my ThermalSensorEntry :

{
    "name": "eqptBoardThermalSensorEntry",
    "oid": ".1.3.6.1.4.1.637.61.1.23.10.1",
    "used_instance": "\\.(\\d+\\.\\d+)$",
    "entries" :[
        { "name": "eqptBoardThermalSensorActualTemperature", "oid": ".1.3.6.1.4.1.637.61.1.23.10.1.2" }
    ]
}

And another selection loop for this table :

{
    "name": "eqptBoardThermalSensorTable",
    "source": "%(snmp.tables.eqptBoardThermalSensorEntry)",
    "expand_table": {
        "eqptBoardThermalSensorEntry": "%(snmp.tables.eqptBoardThermalSensorEntry.[%(eqptBoardThermalSensorEntry.instance)])"
    },
    "formatting": {
        "printf_msg": "Card %s (slot%s): temperature is %s°C",
        "printf_var": [
            "%(eqptBoardEntry.eqptSlotActualType)",
            "%(eqptBoardEntry.eqptBoardContainerOffset)",
            "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature)"
        ],
        "display_ok": true
    }
}

The output is :

Card  (slot): temperature is 38°C
Card  (slot): temperature is 36°C
Card  (slot): temperature is 34°C

It seems we cannot enumerate from one table to another.

Question

I would like to be able to create a SNMP service which can check the board status AND the board temperature sensors. First, just enumerate the board ID, sensor ID and the sensor value, for each Thermal Sensor entry. Then, I would like to sum up the values of all the sensors of one board, and be able to add a constant on this new value and add a perf_data.

Is it possible to get these values with the same collection file ?

Many thanks for your help

garnier-quentin commented 1 year ago

I'm not sure we can do that with the mode collection. It becomes quite complex. You should develop it with perl.

Luth1ng commented 1 year ago

It is supported with the --oid-table and --oid-instance parameters of the string-value mode. It's only checking an OID table based on another instance.

I thought collection-mode might be the future of other modes for the generic SNMP plugin. Is that wrong ?

garnier-quentin commented 1 year ago

could you provide the snmpwalk and the collection file ? I will try to do it

Luth1ng commented 1 year ago

could you provide the snmpwalk and the collection file ? I will try to do it

I just sent you an email with the information you asked, many thanks!

Luth1ng commented 1 year ago

@garnier-quentin hey, could you just confirm you received my email and you have everything needed ? ty

Luth1ng commented 1 year ago

hello @garnier-quentin, any chance you worked on that ?

Luth1ng commented 1 year ago

any news on that please ?

garnier-quentin commented 1 year ago

You need following patch: https://github.com/centreon/centreon-plugins/pull/4568

And now you can use:

{
                "name": "eqptBoardThermalSensorEntry",
                "oid": ".1.3.6.1.4.1.637.61.1.23.10.1",
                "used_instance": "\\.(\\d+\\.\\d+)$",
                "entries": [
                    { "name": "eqptBoardThermalSensorActualTemperature", "oid": ".1.3.6.1.4.1.637.61.1.23.10.1.2" }
                ],
                "instance_entries": {
                    "re": "\\.(\\d+)\\.(\\d+)$",
                    "entries": [
                        { "name": "slotId", "capture": "1" },
                        { "name": "thermalSensorId", "capture": "2" }
                    ]
                }
            }

And:

        {
            "name": "eqptBoardThermalSensorTable",
            "source": "%(snmp.tables.eqptBoardThermalSensorEntry)",
            "expand_table": {
                "eqptBoardThermalSensorEntry": "%(snmp.tables.eqptBoardThermalSensorEntry.[%(eqptBoardThermalSensorEntry.instance)])",
                "eqptBoardEntry": "%(snmp.tables.eqptBoardEntry.[%(eqptBoardThermalSensorEntry.slotId)])"
            },
            "formatting": {
                "printf_msg": "Card %s (slot%s): temperature is %s°C",
                "printf_var": [
                    "%(eqptBoardEntry.eqptSlotActualType)",
                    "%(eqptBoardEntry.eqptBoardContainerOffset)",
                    "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature)"
                ],
                "display_ok": true
            }
        }
Luth1ng commented 1 year ago

hello @garnier-quentin, awesome ! it's working great, the collection mode possibilities are now endless ! many thanks !

however, I am hitting a new bug, not sure if it is directly linked to this enhancement but I was not facing it before : the warning/critical options are not working if I try to compare a dynamic value (not a constant).

For example :

This is working :

//debug :
//    snmp.tables.eqptBoardThermalSensorEntry.[4363.2].eqptBoardThermalSensorShutdownThresholdLow = 115
//    snmp.tables.eqptBoardThermalSensorEntry.[4363.2].eqptBoardThermalSensorActualTemperature = 30

//plugin reply :
OK: All ISAM Boards are OK 

    "constants": {
        "warningTemp": 115
    },
    "selection_loop": [
        {
            "name": "eqptBoardThermalSensorTable",
            "source": "%(snmp.tables.eqptBoardThermalSensorEntry)",
            "expand_table": {
                "eqptBoardThermalSensorEntry": "%(snmp.tables.eqptBoardThermalSensorEntry.[%(eqptBoardThermalSensorEntry.instance)])",
                "eqptBoardEntry": "%(snmp.tables.eqptBoardEntry.[%(eqptBoardThermalSensorEntry.slotId)])"
            },
            "perfdatas": [
                {
                    "nlabel": "%(eqptBoardEntry.eqptBoardContainerOffset).%(eqptBoardThermalSensorEntry.thermalSensorId).temp",
                    "instances":
                        [
                            "%(eqptBoardEntry.eqptBoardContainerOffset)",
                            "%(eqptBoardThermalSensorEntry.thermalSensorId)"
                        ],
                    "value": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature)",
                    "warning": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorTcaThresholdLow)",
                    "critical": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorShutdownThresholdLow)",
                    "min": 0,
                    "unit": "°C"
                }
            ],
            "critical": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature) ge %(constants.warningTemp)",
            "formatting": {
                "printf_msg": "Card %s (%s): temperature sensor #%s is %s°C",
                "printf_var": [
                    "%(eqptBoardEntry.eqptSlotActualType)",
                    "%(eqptBoardEntry.eqptBoardContainerOffset)",
                    "%(eqptBoardThermalSensorEntry.thermalSensorId)",
                    "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature)"
                ],
            "display_ok": false
            }
        }
    ]

This is not :


//debug :
//    snmp.tables.eqptBoardThermalSensorEntry.[4363.2].eqptBoardThermalSensorShutdownThresholdLow = 115
//    snmp.tables.eqptBoardThermalSensorEntry.[4363.2].eqptBoardThermalSensorActualTemperature = 30

//plugin reply :
CRITICAL: Card NALT-J (lt:1/1/12): temperature sensor #1 is 30°C | [...]

    "selection_loop": [
        {
            "name": "eqptBoardThermalSensorTable",
            "source": "%(snmp.tables.eqptBoardThermalSensorEntry)",
            "expand_table": {
                "eqptBoardThermalSensorEntry": "%(snmp.tables.eqptBoardThermalSensorEntry.[%(eqptBoardThermalSensorEntry.instance)])",
                "eqptBoardEntry": "%(snmp.tables.eqptBoardEntry.[%(eqptBoardThermalSensorEntry.slotId)])"
            },
            "perfdatas": [
                {
                    "nlabel": "%(eqptBoardEntry.eqptBoardContainerOffset).%(eqptBoardThermalSensorEntry.thermalSensorId).temp",
                    "instances":
                        [
                            "%(eqptBoardEntry.eqptBoardContainerOffset)",
                            "%(eqptBoardThermalSensorEntry.thermalSensorId)"
                        ],
                    "value": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature)",
                    "warning": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorTcaThresholdLow)",
                    "critical": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorShutdownThresholdLow)",
                    "min": 0,
                    "unit": "°C"
                }
            ],
            "critical": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature) ge %(eqptBoardThermalSensorEntry.eqptBoardThermalSensorShutdownThresholdLow)",
            "formatting": {
                "printf_msg": "Card %s (%s): temperature sensor #%s is %s°C",
                "printf_var": [
                    "%(eqptBoardEntry.eqptSlotActualType)",
                    "%(eqptBoardEntry.eqptBoardContainerOffset)",
                    "%(eqptBoardThermalSensorEntry.thermalSensorId)",
                    "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature)"
                ],
            "display_ok": false
            }
        }
    ]
garnier-quentin commented 1 year ago

For numbers it's following syntax:

"critical": "%(eqptBoardThermalSensorEntry.eqptBoardThermalSensorActualTemperature) > %(eqptBoardThermalSensorEntry.eqptBoardThermalSensorShutdownThresholdLow)",