Closed kkossack closed 4 years ago
Is it a VM? Also how many CPU Cores does it have? Also what version are you using (see scollector -version
)
Yes, both are VMs with 2 Cores. On 2008 C:\scollector>scollector-windows-amd64.exe -version scollector version 20150220180110 (1b01fb0a96425d225a00f8a741c8b7911f3ad953) On 2012 - same.
hmm... we don't get that on our 4 core VMs. The only spikes we see are from WmiPrvSE durring our DSC configuration runs every 15 minutes. Can you use bosun to graph os.cpu and win.proc.cpu_total (with a * under name) for one of your VMs and see if it shows what is consuming the CPU.
Example from Stack Exchange:
Maybe one of the WMI queries that scollector is doing is expensive due to your workload. For example, we had to refactor the linux network collector because it was reading the proc network table in a slow way. That was on our load balancer, which had thousands of open connections. Does your machine have a similar high load of something?
Possible feature: have scollector report runtimes of its collectors.
OK. I’ll have a look.
Have you looked on the windows servers with perfmon which is set to gather 1 second metrics? Rather than 15sec average?
From: Greg Bray [mailto:notifications@github.com] Sent: Monday, March 09, 2015 8:14 PM To: bosun-monitor/bosun Cc: Kossack, Ken Subject: Re: [bosun] scollector on Windows 2008 and 2012 appears to be consuming High CPU processor time (#773)
hmm... we don't get that on our 4 core VMs. The only spikes we see are from WmiPrvSE durring our DSC configuration runs every 15 minutes. Can you use bosun to graph os.cpu and win.proc.cpu_total (with a * under name) for one of your VMs and see if it shows what is consuming the CPU.
Example from Stack Exchange: [https://cloud.githubusercontent.com/assets/304401/6567105/ecb6a2d0-c687-11e4-9fa6-74663100f34f.png]https://cloud.githubusercontent.com/assets/304401/6567105/ecb6a2d0-c687-11e4-9fa6-74663100f34f.png
— Reply to this email directly or view it on GitHubhttps://github.com/bosun-monitor/bosun/issues/773#issuecomment-77972120.
Looks like we have spikes up to 16% every 15 seconds and larger spikes every 15 minutes (again due to DSC). Overall average is 3.0-4.5%.
I think Matt is right, you may just have a different work load that causes the WMI queries to be more expensive. I'll look at adding per collector timing metrics, which we can then use to help find and fix expensive collectors.
Is that 16% with 4 cores? I understand the approach of looking for the expensive WMI calls and removing them, if necessary. The concern I have is that we have another other tool I mentioned also using WMI calls and its CPU usage is in the single digits. So people ask why move to open source solution if it taking more cpu to gather the metrics. We would like to pursue this beyond avoidance.
From: Greg Bray [mailto:notifications@github.com] Sent: Tuesday, March 10, 2015 12:50 PM To: bosun-monitor/bosun Cc: Kossack, Ken Subject: Re: [bosun] scollector on Windows 2008 and 2012 appears to be consuming High CPU processor time (#773)
Looks like we have spikes up to 16% every 15 seconds and larger spikes every 15 minutes (again due to DSC). Overall average is 3.0-4.5%.
[cid:image001.png@01D05BE2.B86815F0]https://cloud.githubusercontent.com/assets/304401/6579922/e1ab3f78-c712-11e4-9ab1-dd6d4720ae2e.png
I think Matt is right, you may just have a different work load that causes the WMI queries to be more expensive. I'll look at adding per collector timing metrics, which we can then use to help find and fix expensive collectors.
— Reply to this email directly or view it on GitHubhttps://github.com/bosun-monitor/bosun/issues/773#issuecomment-78096104.
Two things I can think of. You can lower the default frequency so the queries run less. On our side I wonder if are checks basically run in lockstep - maybe we need a scheduler that can smooth out the load. On Mar 11, 2015 10:04 AM, "kkossack" notifications@github.com wrote:
Is that 16% with 4 cores? I understand the approach of looking for the expensive WMI calls and removing them, if necessary. The concern I have is that we have another other tool I mentioned also using WMI calls and its CPU usage is in the single digits. So people ask why move to open source solution if it taking more cpu to gather the metrics. We would like to pursue this beyond avoidance.
From: Greg Bray [mailto:notifications@github.com] Sent: Tuesday, March 10, 2015 12:50 PM To: bosun-monitor/bosun Cc: Kossack, Ken Subject: Re: [bosun] scollector on Windows 2008 and 2012 appears to be consuming High CPU processor time (#773)
Looks like we have spikes up to 16% every 15 seconds and larger spikes every 15 minutes (again due to DSC). Overall average is 3.0-4.5%.
[cid:image001.png@01D05BE2.B86815F0]< https://cloud.githubusercontent.com/assets/304401/6579922/e1ab3f78-c712-11e4-9ab1-dd6d4720ae2e.png>
I think Matt is right, you may just have a different work load that causes the WMI queries to be more expensive. I'll look at adding per collector timing metrics, which we can then use to help find and fix expensive collectors.
— Reply to this email directly or view it on GitHub< https://github.com/bosun-monitor/bosun/issues/773#issuecomment-78096104>.
— Reply to this email directly or view it on GitHub https://github.com/bosun-monitor/bosun/issues/773#issuecomment-78268437.
@kylebrandt I like the idea of scheduling distributed over time. I wonder if @kkossack could be experiencing some single collector being more expensive on his servers for some reason though. I can't think of a way to diagnose that except by using filters and trial and error.
As of https://github.com/bosun-monitor/bosun/commit/6fccddefdea3254dad01e4e7c424c5615fe9047b you can now see the wall time duration of individual collectors using scollector.collector.duration. Here is an example from the same 4 core vm:
Most collectors take an average of less than 1 second, although we do see spikes up to 10-15 seconds that correlate with our DSC runs. If you find any collectors that are expensive you can try disabling them in the code or use the -f filter to only run the ones you care about.
DSC also uses WMI for processing, so it may be that multiple processes using WMI causes higher CPU. You might want to try running scollector on a VM by itself (no other tools) to see if you still see the high cpu load. It is based on a custom wmi/ole package, so there may be some performance improvements we can make, or as Kyle mentioned you can change the default collection frequency using -freq="15"
I'm seeing similar issues but with "unknown" status during the high cpu spikes. The spikes go from 1 minute up to 90 minutes on some machines. Have not found the cause yet.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
On Windows 2008 SP1 and Windows 2012 Data Center we are seeing the scollector-windows-and64 process consuming 40+/- %Processor Time every 10 seconds (default settings). We are watching the process in perfmon. Has anyone else seen this? Is it expected? On OEL 6 we see single digit CPU being consumed....