dvarounis / perfservmon

Nagios Plugin for IBM WebSphere Application Server using perfservlet
MIT License
8 stars 5 forks source link
heap jdbc-connection-pools jms-sib-destinations monitoring nagios nagios-plugin nagios-plugins orb-thread-pool websphere

Perfservmon

Perfservmon is a Nagios Plugin for IBM Websphere Application Server(WAS) using the perfservlet web application that comes with each WAS installation. It also has minimal library dependencies so it can be easily used on most environments.

The plugin can monitor the following WAS metrics of a WebSphere Cell:

Prerequisites

  1. Perfservlet App Install the PerfServletApp.ear in one WAS server of your WebSphere Cell. This is located in <WAS_ROOT>/installableApps, i.e. this would be in /opt/IBM/WebSphere/AppServer/installableApps in a Unix System.
  2. Python version 2.7 or version 3.X installed at the Nagios host

The plugin is tested to work with WAS Traditional version 8.5 and 9.0.

Setup

  1. Copy the perfservmon.py file in $USER1$ path, which is the plugins path. You will propably find the value of this variable in Nagios resource.cfg file (usually this is a libexec directory).

  2. Add the following lines in Nagios command.cfg file:

#Check_perfservlet commands
#The -H -u -p parameters are optional
#depending on whether you use https and/or Basic Auth credentials to access the perfservlet

define command{
        command_name    check_perfserv_retriever
        command_line    $USER1$/perfservmon.py -C $ARG1$ retrieve -N $ARG2$ -P $ARG3$ -H $ARG4$ -u $ARG5$ -p $ARG6$
        }

define command{
        command_name    check_perfserv_show
        command_line    $USER1$/perfservmon.py -C $ARG1$ show -n $ARG2$ -s $ARG3$ -M $ARG4$ -c $ARG5$ -w $ARG6$
        }

define command{
        command_name    check_perfserv_show_dcp
        command_line    $USER1$/perfservmon.py -C $ARG1$ show -n $ARG2$ -s $ARG3$ -M DBConnectionPoolPercentUsed -j $ARG4$ -c $ARG5$ -w $ARG6$
        }

define command{
        command_name    check_perfserv_show_sib
        command_line    $USER1$/perfservmon.py -C $ARG1$ show -n $ARG2$ -s $ARG3$ -M SIBDestinations -d $ARG4$ -c $ARG5$ -w $ARG6$
        }

Usage

Define Collector Service

Before defining a service using check_perfserv_show it is required to add the following service definition at the WAS Server or the DMgr Server(for ND Architecture) Nagios Config file:

define service{
        use                             local-service        
        host_name                       <WAS_Host>
        service_description             Collect PerfServlet data from Cell
        check_command                   check_perfserv_retriever!<WAS_Cell_Name>!<PerfServ_hostname>!<PerfServ_Port>![http|https]!userid!passwd![--ignorecert]
        }

Where:

define service{
        name                            collector-service           ; The name of this service template
        use                             local-service         ; Inherit default values from the local-service definition
        max_check_attempts              2                       ; Re-check the service up to 2 times in order to determine its final (hard) state
        normal_check_interval           3                       ; Check the service every 3 minutes under normal conditions
        retry_check_interval            1                       ; Re-check the service every minute until a hard state can be determined
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }

Then the collector service definition should be like the following:

define service{
        use                             collector-service        
        host_name                       <WAS_Host>
        service_description             Collect PerfServlet data from Cell
        check_command                   check_perfserv_retriever!<WAS_Cell_Name>!<PerfServ_hostname>!<PerfServ_Port>![http|https]!userid!passwd
        }

Sample Service Definitions for WAS Metrics

define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             WAS Heap usage
        check_command                   check_perfserv_show!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!Heap!<Critical Percentage>!<Warning Percentage>
        }
define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             WAS WebContainer ThreadPool Usage
        check_command                   check_perfserv_show!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!WebContainer!<Critical Percentage>!<Warning Percentage>
        }

Shows all the available connection pools of the WAS Server and show an alert when any of them exceeds the percentage limits.

define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             WAS ConnectionPool Usage
        check_command                   check_perfserv_show!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!DBConnectionPoolPercentUsed!<Critical Percentage>!<Warning Percentage>
        }

Shows a specific connection pool of the WAS Server (specified with ) and show an alert when its usage exceeds the percentage limits.

define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             WAS ConnectionPool JNDI_name Usage
        check_command                   check_perfserv_show_dcp!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!<JNDI_name>!<Critical Percentage>!<Warning Percentage>
        }

Shows the Total Live HTTP Sessions together with the individual(per Module HTTP Sessions). Show an alert when the Total Sessions exceed the limits.

define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             WAS Http Live Sessions
        check_command                   check_perfserv_show!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!LiveSessions!<Critical No of Sessions>!<Warning No of Sessions>
        }
define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             WAS ORB ThreadPool Usage
        check_command                   check_perfserv_show!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!ORB!<Critical Percentage>!<Warning Percentage>
        }
define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             My Topic Space
        check_command                   check_perfserv_show_sib!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!<MyTopicSpaceName>!<No_Messages_Critical>!<No_Messages_Warning>
        }

define service{
        use                             local-service
        host_name                       <WAS_Host>
        service_description             My Exception Destination
        check_command                   check_perfserv_show_sib!<WAS_Cell_Name>!<WAS_Node_Name>!<WAS_server_name>!_SYSTEM.Exception.Destination.<WAS_Node_Name>.<WAS_server_name>-<SIBus_Name>!<No_Messages_Critical>!<No_Messages_Warning>
        }