anapsix / zabbix-haproxy

HAProxy Zabbix Discovery and Template
170 stars 77 forks source link

Issue "Not Supported" #46

Closed ficofer closed 6 years ago

ficofer commented 6 years ago

I follow the README and all seems to be workign fine. Scripts ran fron the haproxy node returns info that make me think the sockets is being opened and read....

I have installed socat and nc, because that was my first stopper... but now in Zabbix Web UI I see NO DATA and in the server I see this:

haproxy:x:188:188:haproxy:/var/lib/haproxy:/sbin/nologin haproxy.list.discovery [t|{ haproxy.stats [t|ERROR: is unsupported] haproxy.stat.qcur [t|ERROR: is unsupported] haproxy.stat.qmax [t|ERROR: is unsupported] haproxy.stat.scur [t|ERROR: is unsupported] haproxy.stat.smax [t|ERROR: is unsupported] haproxy.stat.slim [t|ERROR: is unsupported] haproxy.stat.bin [t|ERROR: is unsupported] haproxy.stat.bout [t|ERROR: is unsupported] haproxy.stat.dreq [t|ERROR: is unsupported] haproxy.stat.dresp [t|ERROR: is unsupported] haproxy.stat.ereq [t|ERROR: is unsupported] haproxy.stat.econ [t|ERROR: is unsupported] haproxy.stat.eresp [t|ERROR: is unsupported] haproxy.stat.wretr [t|ERROR: is unsupported] haproxy.stat.wredis [t|ERROR: is unsupported] haproxy.stat.weight [t|ERROR: is unsupported] haproxy.stat.act [t|ERROR: is unsupported] haproxy.stat.bck [t|ERROR: is unsupported] haproxy.stat.chkfail [t|ERROR: is unsupported] haproxy.stat.chkdown [t|ERROR: is unsupported] haproxy.stat.lastchg [t|ERROR: is unsupported] haproxy.stat.downtime [t|ERROR: is unsupported] haproxy.stat.qlimit [t|ERROR: is unsupported] haproxy.stat.throttle [t|ERROR: is unsupported] haproxy.stat.lbtot [t|ERROR: is unsupported] haproxy.stat.tracked [t|ERROR: is unsupported] haproxy.stat.type [t|ERROR: is unsupported] haproxy.stat.rate [t|ERROR: is unsupported] haproxy.stat.rate_lim [t|ERROR: is unsupported] haproxy.stat.rate_max [t|ERROR: is unsupported] haproxy.stat.check_status [t|ERROR: is unsupported] haproxy.stat.check_code [t|ERROR: is unsupported] haproxy.stat.check_duration [t|ERROR: is unsupported] haproxy.stat.req_rate [t|ERROR: is unsupported] haproxy.stat.req_rate_max [t|ERROR: is unsupported] haproxy.stat.req_tot [t|ERROR: is unsupported] haproxy.stat.cli_abrt [t|ERROR: is unsupported] haproxy.stat.srv_abrt [t|ERROR: is unsupported] haproxy.stat.comp_in [t|ERROR: is unsupported] haproxy.stat.comp_out [t|ERROR: is unsupported] haproxy.stat.comp_byp [t|ERROR: is unsupported] haproxy.stat.comp_rsp [t|ERROR: is unsupported] haproxy.stat.lastsess [t|ERROR: is unsupported] haproxy.stat.qtime [t|ERROR: is unsupported] haproxy.stat.ctime [t|ERROR: is unsupported] haproxy.stat.rtime [t|ERROR: is unsupported] haproxy.stat.status [t|ERROR: is unsupported] haproxy.stat.pid [t|ERROR: is unsupported] haproxy.stat.iid [t|ERROR: is unsupported] haproxy.stat.sid [t|ERROR: is unsupported] haproxy.stat.hrsp_1xx [t|ERROR: is unsupported] haproxy.stat.hrsp_2xx [t|ERROR: is unsupported] haproxy.stat.hrsp_3xx [t|ERROR: is unsupported] haproxy.stat.hrsp_4xx [t|ERROR: is unsupported] haproxy.stat.hrsp_5xx [t|ERROR: is unsupported] haproxy.stat.hrsp_other [t|ERROR: is unsupported] haproxy.stat.hanafail [t|ERROR: is unsupported] haproxy.stat.last_chk [t|ERROR: is unsupported] haproxy.stat.last_agt [t|ERROR: is unsupported]

Not much info to debug any hint ? I rule out a permission issue as I double check that and the socket is being open 666. zabbix_agentd (daemon) (Zabbix) 3.0.15

@anapsix really cool things here thanks for your support!

ficofer commented 6 years ago
/usr/local/bin/haproxy_discovery.sh /var/run/haproxy/info.sock FRONTEND
{
    "data":[

        {
            "{#FRONTEND_NAME}":"ft_http_web"},
        {
            "{#FRONTEND_NAME}":"ft_https_web"},
        {
            "{#FRONTEND_NAME}":"stats"}]}
ficofer commented 6 years ago

From what I can see in the logs it should be showing info....

28175:20180319:124446.628 In zbx_popen() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app12 slim'
 28177:20180319:124446.628 __zbx_zbx_setproctitle() title:'listener #3 [processing request]'
 28177:20180319:124446.628 Requested [haproxy.stats[/var/run/haproxy/info.sock,bk_https_web,HTTP_id-prod-app02,rate_max]]
 28177:20180319:124446.628 In zbx_popen() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app02 rate_max'
 28177:20180319:124446.628 End of zbx_popen():7
 10654:20180319:124446.628 zbx_popen(): executing script
 28175:20180319:124446.628 End of zbx_popen():7
 10655:20180319:124446.629 zbx_popen(): executing script
 28176:20180319:124446.634 __zbx_zbx_setproctitle() title:'listener #2 [processing request]'
 28176:20180319:124446.634 Requested [haproxy.stats[/var/run/haproxy/info.sock,bk_https_web,HTTP_id-prod-app06,qcur]]
 28176:20180319:124446.634 In zbx_popen() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app06 qcur'
 28176:20180319:124446.635 End of zbx_popen():7
 10662:20180319:124446.635 zbx_popen(): executing script
 28177:20180319:124446.647 In zbx_waitpid()
 28177:20180319:124446.647 zbx_waitpid() exited, status:0
 28177:20180319:124446.647 End of zbx_waitpid():10654
 28177:20180319:124446.647 EXECUTE_STR() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app02 rate_max' len:1 cmd_result:'1'
 28177:20180319:124446.647 Sending back [1]
 28177:20180319:124446.648 __zbx_zbx_setproctitle() title:'listener #3 [waiting for connection]'
 28175:20180319:124446.648 In zbx_waitpid()
 28175:20180319:124446.648 zbx_waitpid() exited, status:0
 28175:20180319:124446.648 End of zbx_waitpid():10655
 28175:20180319:124446.648 EXECUTE_STR() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app12 slim' len:1 cmd_result:'0'
 28175:20180319:124446.648 Sending back [0]
 28175:20180319:124446.648 __zbx_zbx_setproctitle() title:'listener #1 [waiting for connection]'
 28176:20180319:124446.655 In zbx_waitpid()
 28176:20180319:124446.655 zbx_waitpid() exited, status:0
 28176:20180319:124446.655 End of zbx_waitpid():10662
 28176:20180319:124446.655 EXECUTE_STR() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app06 qcur' len:1 cmd_result:'0'
 28176:20180319:124446.655 Sending back [0]
 28176:20180319:124446.655 __zbx_zbx_setproctitle() title:'listener #2 [waiting for connection]'
 28178:20180319:124446.813 In send_buffer() host:'127.0.0.1' port:10051 entries:0/100
 28178:20180319:124446.814 End of send_buffer():SUCCEED
 28178:20180319:124446.814 __zbx_zbx_setproctitle() title:'active checks #1 [idle 1 sec]'
 28174:20180319:124446.826 __zbx_zbx_setproctitle() title:'collector [processing data]'
 28174:20180319:124446.826 In update_cpustats()
 28174:20180319:124446.826 End of update_cpustats()
 28174:20180319:124446.826 __zbx_zbx_setproctitle() title:'collector [idle 1 sec]'
 28177:20180319:124447.650 __zbx_zbx_setproctitle() title:'listener #3 [processing request]'
 28177:20180319:124447.650 Requested [haproxy.stats[/var/run/haproxy/info.sock,bk_https_web,HTTP_id-prod-app01,smax]]
 28177:20180319:124447.650 In zbx_popen() command:'/usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app01 smax'
 28175:20180319:124447.650 __zbx_zbx_setproctitle() title:'listener #1 [processing request]'
 28175:20180319:124447.650 Requested [net.if.discovery]
 28177:20180319:124447.650 End of zbx_popen():7
ficofer commented 6 years ago

Some of the stats seems to be retrieving info, but they just show 0... I dont think this affect or is related to the HAproxy version running is it ?

I rule out also a version issue, further investigation show an problem with this:

# check if requested stat is supported
if [ -z "${_STAT}" ]
then
  echo "ERROR: $stat is unsupported"
  exit 127
fi

Looks like all or at least 99% of my stats are giving not supported.

# /usr/local/bin/haproxy_stats.sh prod-app10 smax
ERROR:  is unsupported
# /usr/local/bin/haproxy_stats.sh prod-app10 status
ERROR:  is unsupported
# /usr/local/bin/haproxy_stats.sh prod-app10 downtime
ERROR:  is unsupported
# 
anapsix commented 6 years ago

@ficofer, try checking your cache file /var/tmp/haproxy_stats.cache does it appear to have values?

also, when running the script manually, consider https://github.com/anapsix/zabbix-haproxy/blob/master/userparameter_haproxy.conf#L87-L89

/usr/local/bin/haproxy_stats.sh ft_http_web prod-app10 smax
ficofer commented 6 years ago

@anapsix Thanks for replying/////

It does appear to show values.

look below:

# tail -f /var/tmp/haproxy_stats.cache
bk_https_web,HTTP_id-prod-app07,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,3296,0,,1,4,7,,0,,2,0,,0,L4OK,,0,,,,,,,,,,,0,0,,,,,-1,,,0,0,0,0,,,,Layer4 check passed,,2,3,4,,,,,,tcp,,,,,,,,
bk_https_web,HTTP_id-prod-app08,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,3296,0,,1,4,8,,0,,2,0,,0,L4OK,,0,,,,,,,,,,,0,0,,,,,-1,,,0,0,0,0,,,,Layer4 check passed,,2,3,4,,,,,,tcp,,,,,,,,
bk_https_web,HTTP_id-prod-app09,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,3296,0,,1,4,9,,0,,2,0,,0,L4OK,,0,,,,,,,,,,,0,0,,,,,-1,,,0,0,0,0,,,,Layer4 check passed,,2,3,4,,,,,,tcp,,,,,,,,
bk_https_web,HTTP_id-prod-app10,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,3296,0,,1,4,10,,0,,2,0,,0,L4OK,,0,,,,,,,,,,,0,0,,,,,-1,,,0,0,0,0,,,,Layer4 check passed,,2,3,4,,,,,,tcp,,,,,,,,
bk_https_web,HTTP_id-prod-app11,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,3296,0,,1,4,11,,0,,2,0,,0,L4OK,,0,,,,,,,,,,,0,0,,,,,-1,,,0,0,0,0,,,,Layer4 check passed,,2,3,4,,,,,,tcp,,,,,,,,
bk_https_web,HTTP_id-prod-app12,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,3296,0,,1,4,12,,0,,2,0,,0,L4OK,,0,,,,,,,,,,,0,0,,,,,-1,,,0,0,0,0,,,,Layer4 check passed,,2,3,4,,,,,,tcp,,,,,,,,
bk_https_web,BACKEND,0,0,0,1,9500,5,6617,64494,0,0,,0,0,0,0,UP,12,12,0,,0,3296,0,,1,4,0,,4,,1,0,,2,,,,,,,,,,,,,,0,0,0,0,0,0,1381,,,1,1,0,166,,,,,,,,,,,,,,tcp,,,,,,,,
stats,FRONTEND,,,0,2,2000,6,12642,851647,0,0,3,,,,,OPEN,,,,,,,,,1,5,0,,,,0,0,0,1,,,,0,28,0,4,0,0,,0,4,32,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,0,1,6,28,0,0,
stats,BACKEND,0,0,0,0,200,0,12642,851647,0,0,,0,0,0,0,UP,0,0,0,,0,3296,,,1,5,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,1357,,,0,0,0,223,,,,,,,,,,,,,,http,,,,,,,,

But a lot of 0... may be thats the why of NO DATA... I am using HAProxy HA-Proxy version 1.8.4-1deb90d 2018/02/08

# /usr/local/bin/haproxy_stats.sh /var/run/haproxy/info.sock bk_https_web HTTP_id-prod-app10 smax
DEBUG: SOCAT_BIN        => /bin/socat
DEBUG: NC_BIN           => /bin/nc
DEBUG: FLOCK_BIN        => /bin/flock
DEBUG: FLOCK_WAIT       => 15 seconds
DEBUG: CACHE_FILEPATH   => 
DEBUG: CACHE_EXPIRATION =>  minutes
DEBUG: HAPROXY_SOCKET   => /var/run/haproxy/info.sock
DEBUG: pxname   => bk_https_web
DEBUG: svname   => HTTP_id-prod-app10
DEBUG: stat     => smax
DEBUG: _STAT    => 6:smax:0
DEBUG: _INDEX   => 6
DEBUG: _DEFAULT => 0
DEBUG: using default get() method
DEBUG: stat file found, results are at most 5 minutes stale..
0
anapsix commented 6 years ago

well, that takes care of your ERROR: is unsupported As for having most values at "0", it's coming from HAProxy. Don't forget that data is cached, based on CACHE_EXPIRATION set in the script. You should be seeing data at 0, if that's what it is. However, when you are running the script manually, chances are, you are running it as root user. So file permissions will be messed up. You need to change cache files permissions to allow whatever user zabbix agent is running as to read and write to them. Or just delete the cache files and they will be recreated when zabbix agent runs the script.

ficofer commented 6 years ago

I read that it could be the cause, but I checked it:

-rw-rw-r--. 1 zabbix zabbix 3690 Mar 19 15:38 /var/tmp/haproxy_stats.cache

and it looks like they are correctly owned.

The thing is shouldn't Zabbix be graphing historic even though the values can now be 0 ?

anapsix commented 6 years ago

I'd expect that as well.. check latest data, as it's easier to see what zabbix received from the script

ficofer commented 6 years ago

I see this:

Any idea why so many 0 ?

screenshot at 2018-03-19 16 07 03

It should not be like that right ?

ficofer commented 6 years ago

Also... in the output of zabbix_agentd -p this did not change though :-1:

haproxy.stats                                 [t|ERROR:  is unsupported]
haproxy.stat.qcur                             [t|ERROR:  is unsupported]
haproxy.stat.qmax                             [t|ERROR:  is unsupported]
haproxy.stat.scur                             [t|ERROR:  is unsupported]
haproxy.stat.smax                             [t|ERROR:  is unsupported]
haproxy.stat.slim                             [t|ERROR:  is unsupported]
haproxy.stat.bin                              [t|ERROR:  is unsupported]
haproxy.stat.bout                             [t|ERROR:  is unsupported]
haproxy.stat.dreq                             [t|ERROR:  is unsupported]
haproxy.stat.dresp                            [t|ERROR:  is unsupported]
haproxy.stat.ereq                             [t|ERROR:  is unsupported]
haproxy.stat.econ                             [t|ERROR:  is unsupported]
haproxy.stat.eresp                            [t|ERROR:  is unsupported]
haproxy.stat.wretr                            [t|ERROR:  is unsupported]
haproxy.stat.wredis                           [t|ERROR:  is unsupported]
haproxy.stat.weight                           [t|ERROR:  is unsupported]
haproxy.stat.act                              [t|ERROR:  is unsupported]
haproxy.stat.bck                              [t|ERROR:  is unsupported]
haproxy.stat.chkfail                          [t|ERROR:  is unsupported]
haproxy.stat.chkdown                          [t|ERROR:  is unsupported]
haproxy.stat.lastchg                          [t|ERROR:  is unsupported]
haproxy.stat.downtime                         [t|ERROR:  is unsupported]
haproxy.stat.qlimit                           [t|ERROR:  is unsupported]
haproxy.stat.throttle                         [t|ERROR:  is unsupported]
haproxy.stat.lbtot                            [t|ERROR:  is unsupported]
haproxy.stat.tracked                          [t|ERROR:  is unsupported]
haproxy.stat.type                             [t|ERROR:  is unsupported]
haproxy.stat.rate                             [t|ERROR:  is unsupported]
haproxy.stat.rate_lim                         [t|ERROR:  is unsupported]
haproxy.stat.rate_max                         [t|ERROR:  is unsupported]
haproxy.stat.check_status                     [t|ERROR:  is unsupported]
haproxy.stat.check_code                       [t|ERROR:  is unsupported]
haproxy.stat.check_duration                   [t|ERROR:  is unsupported]
haproxy.stat.req_rate                         [t|ERROR:  is unsupported]
haproxy.stat.req_rate_max                     [t|ERROR:  is unsupported]
haproxy.stat.req_tot                          [t|ERROR:  is unsupported]
haproxy.stat.cli_abrt                         [t|ERROR:  is unsupported]
haproxy.stat.srv_abrt                         [t|ERROR:  is unsupported]
haproxy.stat.comp_in                          [t|ERROR:  is unsupported]
haproxy.stat.comp_out                         [t|ERROR:  is unsupported]
haproxy.stat.comp_byp                         [t|ERROR:  is unsupported]
haproxy.stat.comp_rsp                         [t|ERROR:  is unsupported]
haproxy.stat.lastsess                         [t|ERROR:  is unsupported]
haproxy.stat.qtime                            [t|ERROR:  is unsupported]
haproxy.stat.ctime                            [t|ERROR:  is unsupported]
haproxy.stat.rtime                            [t|ERROR:  is unsupported]
haproxy.stat.status                           [t|ERROR:  is unsupported]
haproxy.stat.pid                              [t|ERROR:  is unsupported]
haproxy.stat.iid                              [t|ERROR:  is unsupported]
haproxy.stat.sid                              [t|ERROR:  is unsupported]
haproxy.stat.hrsp_1xx                         [t|ERROR:  is unsupported]
haproxy.stat.hrsp_2xx                         [t|ERROR:  is unsupported]
haproxy.stat.hrsp_3xx                         [t|ERROR:  is unsupported]
haproxy.stat.hrsp_4xx                         [t|ERROR:  is unsupported]
haproxy.stat.hrsp_5xx                         [t|ERROR:  is unsupported]
haproxy.stat.hrsp_other                       [t|ERROR:  is unsupported]
haproxy.stat.hanafail                         [t|ERROR:  is unsupported]
haproxy.stat.last_chk                         [t|ERROR:  is unsupported]
haproxy.stat.last_agt                         [t|ERROR:  is unsupported]
anapsix commented 6 years ago

You must be missing a macro var, out have it set to empty string. It's rather difficult for me to guess

anapsix commented 6 years ago

Also, you seem to be running constant debug.. which is messing with output

ficofer commented 6 years ago

@anapsix I have disable debug, what its weird to me is that so many stats are not supported...

What would you advised me to check to figure out if it is this ? macro var, ???

Thanks for your guide!

ficofer commented 6 years ago

Only haproxy.list.discovery seems to be supported in the output of zabbix_agentd

anapsix commented 6 years ago

do understand why items were not supported, while you had DEBUG enabled in the script?

ficofer commented 6 years ago

@anapsix

Not really, because even with the DEBUG ON the not supported exit before doing anything in the script.

It exist in this IF

# check if requested stat is supported
if [ -z "${_STAT}" ]
then
  echo "ERROR: $stat is unsupported"
  exit 127
fi

Looks like all or at least 99% of my stats are giving not supported.

# /usr/local/bin/haproxy_stats.sh prod-app10 smax
ERROR:  is unsupported
# /usr/local/bin/haproxy_stats.sh prod-app10 status
ERROR:  is unsupported
# /usr/local/bin/haproxy_stats.sh prod-app10 downtime
ERROR:  is unsupported
# 

Any other way to debug that?

ficofer commented 6 years ago

Is not like I have an output like this:

DEBUG: SOCAT_BIN        => /bin/socat
DEBUG: NC_BIN           => /bin/nc
DEBUG: FLOCK_BIN        => /bin/flock
DEBUG: FLOCK_WAIT       => 15 seconds
DEBUG: CACHE_FILEPATH   => 
DEBUG: CACHE_EXPIRATION =>  minutes
DEBUG: HAPROXY_SOCKET   => /var/run/haproxy/info.sock
DEBUG: pxname   => bk_https_web
DEBUG: svname   => HTTP_id-prod-app10
DEBUG: stat     => smax
DEBUG: _STAT    => 6:smax:0
DEBUG: _INDEX   => 6
DEBUG: _DEFAULT => 0
DEBUG: using default get() method
DEBUG: stat file found, results are at most 5 minutes stale..
0
anapsix commented 6 years ago

as I've mentioned /usr/local/bin/haproxy_stats.sh prod-app10 smax is incorrect syntax, you are missing required variables. See example here As for the 0, if that's what coming from HAProxy, that's what it is. You can check stats directly from HAProxy

echo "show stat" | nc -U /var/run/haproxy/info.sock | cut -d "," -f 1,2,5-11,18,24,27,30,36,50,37,56,57,62 | column -s, -t