dav3860 / vmbix

Fork of vmbix, a TCP proxy for querying a VMWare infrastucture with Zabbix
53 stars 19 forks source link

High bandwidth usage on Zabbix VmBix server #35

Closed ghost closed 6 years ago

ghost commented 8 years ago

Hi dav,

in my setup I have vCenter 6 with 2 virtualization hosts and about 320 virtual machines managed by vCenter.

Also, I have the Zabbix 3.0.4 with VmBix 2.4 installed on CentOS 7 x64.

With default setting of VmBix (after VMs discovery has done), inbound network traffic is about 50 Mbps average. Almost all traffic is coming from vCenter. During monitoring used tools nload, nethogs and iftop.

For now, I disabled following items on Template VmBix VM Loadable Module: Ballooned memory, Compressed memory, CPU Freq, CPU Overall Usage, CPU Ready, CPU Ready (%), CPU Total, CPU Used, Memory Latency, Private memory, Shared memory, Swapped memory, VM Tools mounted, VM Tools running, VM Tools status.

Also, increased Update interval on several items and Discovery rules.

Is there a way to determine which items are top consumers of bandwidth? How I can optimize network bandwidth usage on the Zabbix server with VmBix?

Please help

whosgonna commented 8 years ago

I had a similar issue after upgrade where vmware counters became unavailable. This is because the counter name changed from being just the counter name to the counter name followed by the aggregation type as a suffix - For example:

vmbix[vm.counter,{HOST.HOST},cpu.ready]

became

vmbix[vm.counter,{HOST.HOST},cpu.ready.summation]
                                       ^^^^^^^^^

This caused the item to become unsupported, and the bandwidth usage rose considerable. This was on ~ 1500 items averaging about 10 minutes intervals. Once i updated all of the items, the bandwidth returned to normal form. I had meant to file a bug report for this behavior, but forgot. You might want to re-enable those items to see if any are unavailable, and then confirm the item type is correct.

ghost commented 8 years ago

whosgonna, thank you for your reply and sharing your experience.

In my case, I am getting values for the all items, but experiencing high network bandwidth usage as stated in my previous post.

It would be useful if it could be tracked on some way.

Is it normal for VmBix to consume 50 Mbps in environment with 320 VMs (about 150 running simultaneously, other VMs are powered off)?

whosgonna commented 8 years ago

I'm not the developer here, so I can only speak anecdotally about my environment, but 50 Mbps does seem high. Of course the factor is less the number of VMs, and usually a factor of the number of new values per second - how many items are you querying via vmbix total over what period of time? i.e. if you're querying power state once per hour for 320 machines that should be less load than querying power, ready, CPU Frequency, etc. for 50 machines once every 60 seconds.

Also, are you checking the values for all of the times listed on the powered off machines? If a machine is powered off it should't really have values for CPU usage, memory latency, etc. Functionally I'd agree that those values should be zero, but I'd suggest trying to look into those types of items to isolate the problem. Do you see the behavior if you re-enable those items ONLY on the powered on machines?

ghost commented 8 years ago

By disabling different items (mostly in Template VmBix VM Loadable Module), I decreased amount of data on 30 Mbps for inbound traffic. This is much better, but still very high bandwidth utilization.

@whosgonna You were right, VmBix is constantly checking all virtual machines, powered on and powered off.

In file /var/log/vmbix.log there are multiple entries like following:

2016-08-19 14:29:45,647 INFO  [Thread-278] [VmBix.java:2412] VM '4213ac9a-d97c-6f0a-22bb-54b28ca994d4' is not powered on. Performance counters unavailable.
2016-08-19 14:29:45,654 INFO  [Thread-278] [VmBix.java:2412] VM '4213238e-f2fb-b539-1b76-c3d4592bc32d' is not powered on. Performance counters unavailable.
2016-08-19 14:29:45,659 INFO  [Thread-278] [VmBix.java:2412] VM '4213d5f4-b07c-b169-5883-6066d0dd7fbc' is not powered on. Performance counters unavailable.
2016-08-19 14:29:45,669 INFO  [Thread-278] [VmBix.java:2412] VM '42131365-b18e-05f1-1a41-2088c71da949' is not powered on. Performance counters unavailable.
2016-08-19 14:29:45,672 INFO  [Thread-278] [VmBix.java:2412] VM '4213a75e-25e1-a7a9-67dd-3ef5774290f6' is not powered on. Performance counters unavailable.
2016-08-19 14:29:45,679 INFO  [Thread-278] [VmBix.java:2412] VM '4213f681-574e-533d-7250-43cc93633306' is not powered on. Performance counters unavailable.

Amount of received data would be dramatically decreased if Zabbix would check powered on virtual machines only

But... how to implement it? I'm not so familiar with modifying VmBix..

Any idea?

dav3860 commented 8 years ago

Hi, As whosgonna said, the performance counters methods were modified in v2.2 and now require the rollup type in the counter name (see CHANGELOG). If your VmBix template was not updated it may create unnecessary retries on the VMWare API. 50 Mbps seems a lot. I am currently monitoring a vCenter with 360 VMs, and the VmBix bandwidth is more or less 5 Mbps. However, most of the VMs are up. If a significant part of your VMs are down, Zabbix will constantly retry to query items according to its unsupported items refresh interval. If it's too low, it will create a high load of VmBix and your vCenters. The messages that appear in your log file indicate that VmBix detected that the VM is down before checking the item, so it doesn't continue. It will put the Zabbix item in an unsupported state but it shouldn't consume more bandwith (even less).

dav3860 commented 8 years ago

You can filter powered-off VMs in Zabbix in two manners :