Griesbacher / nagflux

A connector which copies performancedata from Nagios / Icinga(2) / Naemon to InfluxDB
GNU General Public License v2.0
65 stars 30 forks source link

Weird performanceLabel #19

Closed jframeau closed 7 years ago

jframeau commented 7 years ago

Context OMD - Open Monitoring Distribution Version 2.11.20161101-labs-edition CentOS 7

After a full restart of omd, there was some weird entries in influxdb:

1478000205000000000 check-https centurion 0 ,241334s;;;0,000000 size google B 11142
1478000205000000000 check-https centurion time google 0
1478000265000000000 check-https centurion 0 ,268444s;;;0,000000 size google B 11111
1478000265000000000 check-https centurion time google 0

The check behind is this one (a standard check_http from omd): /omd/sites/recette/lib/nagios/plugins/check_http --hostname www.google.fr --ssl

There are the only fourth strange lines, anything else is ok. nagflux.log looks ok at this time.

jfr

Griesbacher commented 7 years ago

I can't reproduce this issue. I installed the same OMD Version and added the given check:

/omd/sites/***/lib/nagios/plugins/check_http --hostname www.google.fr --ssl

Influxdb data:

> select * from metrics where service='check_http_ssl' order by time desc limit 6
name: metrics
-------------
time                    command         crit    crit-fill       host                    min     performanceLabel        service         unit    value           warn    warn-fill
2016-11-02T15:25:30Z    check_http_ssl                          nagiostest_host_1       0       size                    check_http_ssl  B       11197
2016-11-02T15:25:30Z    check_http_ssl                          nagiostest_host_1       0       time                    check_http_ssl  s       0.161211
2016-11-02T15:25:24Z    check_http_ssl                          nagiostest_host_1       0       size                    check_http_ssl  B       11171
2016-11-02T15:25:24Z    check_http_ssl                          nagiostest_host_1       0       time                    check_http_ssl  s       0.164437
2016-11-02T15:24:30Z    check_http_ssl                          nagiostest_host_1       0       size                    check_http_ssl  B       11259
2016-11-02T15:24:30Z    check_http_ssl                          nagiostest_host_1       0       time                    check_http_ssl  s       0.156132

I checked it with Icinga2 and Nagios3.

Greets, Philip

Griesbacher commented 7 years ago

I'll close that one due to inactivity

nicolasguillier commented 7 years ago

Hello,

I think I have the same problem. I don't known if I have to create a new ticket.

I use check_http plugin and my user has an environment 'FR'.

` LANG=fr_FR.UTF-8

LANGUAGE=

LC_CTYPE="fr_FR.UTF-8" ... `

It's not probably the best choice but, now, data contain decimal numbers with comma and not with a full stop like this:

/usr/lib/nagios/plugins/check_http -H '###########' -p '80' HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0,004 second response time |time=0,003747s;;;0,000000 size=453B;;;0

I think nagflux has a problem to parse the perfdata.

I have this in database: ` select * from metrics where host = '#############' and service = 'Web' order by time desc limit 6; name: metrics

time command crit crit-fill host max min performanceLabel service unit value warn warn-fill


1484608896000000000 check_http ############# time Web 0 1484608896000000000 check_http ############# 0 ,001787s;;;0,000000 size Web B 453 1484608596000000000 check_http ############# time Web 0 1484608596000000000 check_http ############# 0 ,001407s;;;0,000000 size Web B 453 1484608296000000000 check_http ############# 0 ,001659s;;;0,000000 size Web B 453 1484608296000000000 check_http ############# time Web 0 `

I use influxdb v1.1.1 and nagflux v0.3.0.

Best regards, Nicolas

Griesbacher commented 7 years ago

Hi Nicolas,

that could be the root of the problem. By definition the comma is not a valid number in the value field: https://www.monitoring-plugins.org/doc/guidelines.html#AEN201

But I'll have a look into it, by time.

Greetings, Philip

nicolasguillier commented 7 years ago

Hi Philip,

Thank you for the fix but now, only the performanceLabel 'size' is sent to influxdb.

In the nagflux logfile, i have this: metrics,host=HOST_SERVER,service=Web,command=check_http,performanceLabel=size,unit=B value=128728.0,min=0.0 148950070400

And the performanceLabel 'time' has disappeared.

Best regards, Nicolas

Griesbacher commented 7 years ago

Hi Nicolas,

strange... which core are you using? Nagios/Icinga(2)/Naemon?

Here is what I've tested:

./check_http -H "heise.de"
HTTP OK: HTTP/1.1 301 Moved Permanently - 555 bytes in 6,049 second response time |time=6,049175s;;;0,000000;10,000000 size=555B;;;0

Not so greate because of the comma, but not the business of Nagflux, the core has to handle that.

Icinga2 Perfdata:

DATATYPE::SERVICEPERFDATA   TIMET::1489562456   HOSTNAME::devel       SERVICEDESC::http       SERVICEPERFDATA::time=0.001034s;;;0.000000;10.000000 size=10975B;;;0 SERVICECHECKCOMMAND::http   HOSTSTATE::UP   HOSTSTATETYPE::HARD     SERVICESTATE::OK SERVICESTATETYPE::HARD

Nagios3 Perfdata:

DATATYPE::SERVICEPERFDATA   TIMET::1489562755   HOSTNAME::nagiostest SERVICEDESC::heise_http SERVICEPERFDATA::time=0.000569s;;;0.000000;10.000000 size=10975B;;;0 SERVICECHECKCOMMAND::check_http!heise.de    HOSTSTATE::UNREACHABLE  HOSTSTATE                                                                              TYPE::HARD  SERVICESTATE::OK    SERVICESTATETYPE::HARD

There are no commata, just dots as separator. That's what I've also missed to check in the first place, Nagflux does not care about the Pluginoutput style, the core has to handle that. What's import is the style of the Perfdatafiles. Could you stop Nagflux reschedule the check, get the Perdata(it has to look like the data above) and post it here.

nicolasguillier commented 7 years ago

Hi,

I use nagios3 (v3.5.1).

I did what you asked me and I have this:

DATATYPE::SERVICEPERFDATA TIMET::1489572014 HOSTNAME::HOST_SERVER SERVICEDESC::web SERVICEPERFDATA::time=0,004118s;;;0,000000 size=128766B;;;0 SERVICECHECKCOMMAND::check_http!HOST_SERVER!80!/!20 HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::HTTP OK: HTTP/1.1 200 OK - 128766 bytes in 0,004 second response time

And in nagflux.log, I have always perfdata "metrics,host=HOST_SERVER,service=web,command=check_http,performanceLabel=size,unit=B min=0.0,value=128766.0 1489572014000" but not the other.

Griesbacher commented 7 years ago

Hey Nicolas,

I could reproduce the error with your given data and also made a test from it. Should be working now ;) Your could try out the latest release, but beware there are some changes in the config, just take a look at the example.

Best regards, Philip

nicolasguillier commented 7 years ago

Great! It works. Thank you.

Griesbacher commented 7 years ago

You're welcome :)