anapsix / zabbix-haproxy

HAProxy Zabbix Discovery and Template
170 stars 77 forks source link

Bytes in/out data always 0 #21

Closed zbikmarc closed 8 years ago

zbikmarc commented 8 years ago

Zabbix server 3.0.4 Zabbix agent 2.4.7 HAProxy 1.6.3

When I run: root@zabbix:~# zabbix_get -s 192.168.1.19 -k haproxy.stats[/run/haproxy/info.sock,keystone_admin_pool,keystone1,bin] From server I have nice response: 27722876

Same nice response for bout. However, when I check in HAProxy web stats I have 27739682 bytes in. So it differs, I understand it is cached (5min?) but on graphs I see straight line a 0... with one peak in last 14 days. Change is much much bigger.

root@zabbix:~# date; zabbix_get -s 192.168.1.19 -k haproxy.stats[/run/haproxy/info.sock,keystone_admin_pool,keystone1,bin] Mon Aug 8 15:07:03 CEST 2016 27722876

root@zabbix:~# date; zabbix_get -s 192.168.1.19 -k haproxy.stats[/run/haproxy/info.sock,keystone_admin_pool,keystone1,bin] Mon Aug 8 15:15:10 CEST 2016 27722876

Exact same response when I use admin socket I think it might be script issue because web stats are working.

kickitt commented 8 years ago

Hi i have the same issue! Need help!^^

noamik commented 8 years ago

Check the template. It looks like the value is stored as a delta value (I just looked at the xml, please take a close look in the zabbix web interface). If you don't want it to be a delta value, you need to change the zabbix template to store the absolute value.

Since zabbix_get returns the absolute value you are expecting the issue is not in the scripts but in what you expect zabbix to show you.

zbikmarc commented 8 years ago

Since zabbix_get returns the absolute value you are expecting the issue is not in the scripts but in what you expect zabbix to show you.

Problem is that, zabbix see no changes in this absolute value, while it is changing (I can see this in both socket and haproxy WeUI). Zabbix still shows exact same value as above when in WebUI I see 31064061. This is why I think caching or something related is not working properly.

UPDATE: look at this, cache file was not updated last few hours:

root@haproxy1:~# date Tue 9 Aug 08:55:03 CEST 2016 root@haproxy1:~# ls -l /var/tmp/haproxy_stats.cache -rw-r--r-- 1 root root 7336 Aug 8 14:21 /var/tmp/haproxy_stats.cache

noamik commented 8 years ago

Since I'm not using this version of the scripts myself, nor did I author them, I can only guess. I didn't find an explicit polling of a cache update method or something in the template. What I did find though is the following method:

# generate stats cache file
get_stats() {
  find ${CACHE_STATS_FILEPATH} -mmin +${CACHE_STATS_EXPIRATION} -delete >/dev/null 2>&1
  if [ ! -e ${CACHE_STATS_FILEPATH} ]
  then
    debug "no cache file found, querying haproxy"
    query_stats "show stat" > ${CACHE_STATS_FILEPATH}
  else
    debug "cache file found, results are at most ${CACHE_STATS_EXPIRATION} minutes stale.."
  fi
}

You might want to check whether maybe on your system find doesn't properly recognize that your cache file is outdated. It's supposed to delete the cache file after CACHE_STATS_EXPIRATION minutes. If your find for some reason doesn't support the -mmin parameter or the -delete parameter, this could lead to your problems. Another reason could be insufficient permissions for the script when run by zappix to delete the file.

Edit: To clarify: after find there is a check whether the cache file exists. Only if it doesn't, it is recreated with fresh values. So I'm pretty sure that the file not getting deleted as it should is the root cause of your problems.

kickitt commented 8 years ago

@zbikmarc , i think, that "31064061" is a traffic which you can get by zabbix_get. The templates graph which named "frontend/backend bytes in/out" make checks on speed of responses etc. So if i right the name of the graph is wrong. @noamik is it right?

zbikmarc commented 8 years ago

@kickitt Clue is that I cannot get 31064061 from zabbix_get ;) It looks like cache is not updated - file was being deleted. I need to find out why (-mmin i -delete worked, at least from root user).

kickitt commented 8 years ago

However, when I check in HAProxy web stats I have 27739682 bytes in

Since zabbix_get returns the absolute value you are expecting the issue is not in the scripts but in what you expect zabbix to show you.*

and what can i see http://screencloud.net/v/32zy - the kbps values

zbikmarc commented 8 years ago

Finially I was able to make it working. Not sure what I did but I supposed that was permissions problem for /var/tmp/haproxy_stats.cache. I deleted it and run zabbix_get again.

UPDATE: But I have one observation: 5min cache and 1min refresh for graph/query data are misleading. Change in graph and in "latest data" will be visible every 5 minutes, it looks strange on graphs ;) I suppose better will be to have similar times or shorter cache then refresh rate.

kickitt commented 8 years ago

@zbikmarc it's nice fix :D, after reinstall zabbix_get all returned values is the same but templates graph returns something other - any ideas?

zbikmarc commented 8 years ago

Check date of /var/tmp/haproxy_stats.cache file Maybe same it is outdated?

noamik commented 8 years ago

@kickitt Looks suspicious, like if the cache doesn't update for you either. Try to delete the cache file and run the zabbix_get command again. You should get an updated cache file. Make sure the zabbix user can delete the file. You can check by manually running the find command as zabbix user. Use -mmin +1 and make sure the file is at least one minute old before running the find command.

sudo -u zabbix -s -- 'find /var/tmp/haproxy_stats.cache -mmin +1 -delete'

You should encounter no errors and the file should get deleted.

kickitt commented 8 years ago

@noamik it's a pity, but i haven't the cache file totaly on zabbix server

zbikmarc commented 8 years ago

And its good. It should be only on HAproxy machine.

noamik commented 8 years ago

In this case you need to find out why it isn't created. Either you are using another cache file (check your configuration) or you have a permissions issue which prevents the script from creating the file. There is no way around it: you have to check each step manually to find out which part fails.

Edit: and @zbikmarc is right, the file should only exist on the haproxy server, not on the zabbix server.

kickitt commented 8 years ago

@zbikmarc

And its good. It should be only on HAproxy machine.

Do you mean that on haproxy server must be apache installed? or haproxy creates www directory ?

zbikmarc commented 8 years ago

I'm sure that HAProxy does not need Apache ;) You are monitoring HAProxy with zabbix, right? And Zabbix Server is on different machine? If yes then that cache file is on HAProxy server, not Zabbix Server and it is created/refreshed by zabbix-agent running on same machine as HAProxy server. I hope it will help.

noamik commented 8 years ago

@zbikmarc Regarding your update for cache and graph values: it might be a good idea if you adjusted those values to something sane and created a merge/pull request. @anapsix is always pretty quick integrating useful improvements into this project.

kickitt commented 8 years ago

So, i'm not sure how but i fixed it. Issue was around this cache file which was @ /var/tmp/ and named like in script haproxy_stats.cache. I gave haproxy and root rules for zabbix user and activated AllowRoot and EnableRemoteCommands option in zabbix agent configuration.

zbikmarc commented 8 years ago

I did nothing related to code. All I did was remove cache and run zabbix_get. I suppose previously I run script manually and it was created with wrong permissions. I'm closing this issue and I hope this short conversation will help others in future :)

noamik commented 8 years ago

@kickitt While this fixed you issues, it isn't a recommendable approach for production systems. You shouldn't need to give zabbix essentially root permissions. It should be sufficient to give the zabbix user write rights on the /var/tmp folder and especially the cache file.

If your system isn't connected to the internet your current solution might be acceptable. If it is connected, you might want to review it.

kickitt commented 8 years ago

@noamik Yes, i know about, it's a system for tests before production. Thank you for recommendation! I think it's really was created with wrong permissions.

anapsix commented 8 years ago

as always, thanks @noamik for jumping in 💯 @kickitt, sudo is not required, however when and if you test the script as any user other than zabbix (or whichever user is running Zabbix agent), cache file permissions will get messed up..

glad you figured it out, though

kickitt commented 7 years ago

@noamik Hi bro!) I have one more question. I try to import the template to zabbix 3.2.1. When i have tried to import the template i got an error that tag filter is missing. (Invalid tag "/zabbix_export/templates/template(1)/discovery_rules/discovery_rule(1)": the tag "filter" is missing.) Then i uncommented this tag in template and import became successful. But after discovering i had a lot of errors of creating graphs. (Cannot create graph: graph with the same name "[frntend80] denied requests" already exists.) etc. Can you help me with it?

noamik commented 7 years ago

Sorry, but no, I can't. I haven't encountered this myself. To make sure you are really hitting a reproducible bug, you could try to run the discovery for a single haproxy host on a fresh test instance of zabbix. If there is a bug, it should again create multiple graphs with the same name. If it works, you had left-over graph stubs from earlier discovery runs or other tests in your installation.

I myself am still running a much older version of this project without many of the current discovery options, so I can't comment on that.

If you can reproduce the behavior, try creating a simplified version of your haproxy configuration which still exhibits your issues but doesn't disclose any internals of your setup. This would help diagnosing the root cause of your issues.

I'm sorry, but without further information I can't give any more specific hints.

gorazdzagar commented 7 years ago

Solution is to set the expiration time to 1 minute in haproxy_stats.sh:

CACHE_STATS_EXPIRATION="${CACHE_STATS_EXPIRATION:-1}" # in minutes