it appears that pagerduty deny more than about 10-20 posts/minute, returning "arriving too quickly -- retry", but the script never does retry, and this results in dropped alerts.
The following fix to this script will address the situation:
elsif (is_client_error($resp->code)) {
syslog(LOG_WARNING, "Nagios event in file %s REJECTED by the PagerDuty server. Server says: %s", $filename, $resp->content);
unlink($filename) if ($resp->content !~ /retry later/);
unlink($filename);
}
Also, I have modified the script to send a cleaner request/request key, if you're interested (this is replacement code for the beginning of the enqueue_event sub
sub enqueue_event {
my %event;
Scoop all the Nagios related stuff out of the environment.
From a support ticket:
I have found an issue with the script https://raw.github.com/ryanhoskin/pagerduty-opsview-pl/master/pagerduty_nagios.pl as referenced by the Nagios/Opsview integration guide.
it appears that pagerduty deny more than about 10-20 posts/minute, returning "arriving too quickly -- retry", but the script never does retry, and this results in dropped alerts.
The following fix to this script will address the situation:
elsif (is_client_error($resp->code)) {
syslog(LOG_WARNING, "Nagios event in file %s REJECTED by the PagerDuty server. Server says: %s", $filename, $resp->content);
}
Also, I have modified the script to send a cleaner request/request key, if you're interested (this is replacement code for the beginning of the enqueue_event sub
sub enqueue_event {
my %event;
Scoop all the Nagios related stuff out of the environment.
while ((my $k, my $v) = each %ENV) {
next unless $k =~ /^NAGIOS_(.*)$/;
$event{$1} = $v;
}
Apply any other variables that were passed in.
%event = (%event, %opt_fields);
Set pd_nagios_object if not set
unless ($event{"pd_nagios_object"}) {
$event{'HOSTNAME'} = $ENV{'NAGIOS_HOSTNAME'};
$event{'NOTIFICATIONTYPE'} = $ENV{'NAGIOS_NOTIFICATIONTYPE'};
$event{'HOSTGROUP'} = $ENV{'NAGIOS_HOSTGROUPNAME'} if (defined($ENV{'NAGIOS_HOSTGROUPNAME'}));
$event{'_CONTACTPAGERDUTY_SERVICE_KEY'} = $ENV{'NAGIOS__CONTACTPAGERDUTY_SERVICE_KEY'};
$event{'OUTPUT'} = '';
if ($ENV{"NAGIOS_SERVICEDESC"}) {
This is a service alert
$event{"pd_nagios_object"} = "service";
$event{'pd_incident_key'} = $ENV{'NAGIOS_HOSTNAME'}.'/'.$ENV{'NAGIOS_SERVICEDESC'};
$event{'SERVICESTATE'} = $ENV{'NAGIOS_SERVICESTATE'};
$event{'DURATION'} = $ENV{'NAGIOS_SERVICEDURATION'} if (defined($ENV{'NAGIOS_SERVICEDURATION'}));
$event{'OUTPUT'} .= $ENV{'NAGIOS_SERVICEOUTPUT'} if (defined($ENV{'NAGIOS_SERVICEOUTPUT'}));
$event{'OUTPUT'} .= "\n".$ENV{'NAGIOS_LONGSERVICEOUTPUT'} if ($ENV{'NAGIOS_LONGSERVICEOUTPUT'});
$event{'SERVICEDESC'} = $ENV{'NAGIOS_SERVICEDESC'}. '; '. $event{'OUTPUT'};
} else {
This is a host alert
$event{"pd_nagios_object"} = "host";
$event{'pd_incident_key'} = $ENV{'NAGIOS_HOSTNAME'};
$event{'HOSTSTATE'} = $ENV{'NAGIOS_HOSTSTATE'};
$event{'DURATION'} = $ENV{'NAGIOS_HOSTDURATION'} if (defined($ENV{'NAGIOS_HOSTDURATION'}));
$event{'OUTPUT'} .= $ENV{'NAGIOS_HOSTOUTPUT'} if (defined($ENV{'NAGIOS_HOSTOUTPUT'}));
$event{'OUTPUT'} .= "\n".$ENV{'NAGIOS_LONGHOSTOUTPUT'} if ($ENV{'NAGIOS_LONGHOSTOUTPUT'});
}
}