elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.06k stars 4.89k forks source link

[Metricbeat] Add support for cpuinfo metricset #25471

Open anpag opened 3 years ago

anpag commented 3 years ago

Describe the enhancement: Add Sigar cpuinfo MHz cpu usage implementation into metricbeat: https://github.com/hyperic/sigar/blob/ad47dc3b494e9293d1f087aebb099bdba832de5e/go_bindings/gotoc/cpuInfo.go

Describe a specific use case for the enhancement or feature: The cgo bindings to libsigar in the official Sigar repo support Mhz cpu usage collection. I wonder if it would be possible to add it to metricbeat as an extra dataset to the system module. Monitoring cpu usage in Mhz is useful to monitor cpu turbo boost.

Thanks in advance for taking the time to read this and thanks for all the hard work with the Beats!

elasticmachine commented 3 years ago

Pinging @elastic/integrations (Team:Integrations)

gingerwizard commented 3 years ago

I would also like to see this enhanced with the processor name and model - specifically information from /cat/cpuinfo

processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
stepping        : 7
microcode       : 0x5003005
cpu MHz         : 3088.589
cache size      : 36608 KB
physical id     : 0
siblings        : 24
core id         : 11
cpu cores       : 12
apicid          : 23
initial apicid  : 23
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 4999.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

We see very high variance across models of CPU in cloud providers e.g. consider the N1 type in GCE.

N1 machine types are Compute Engine's first-generation general-purpose machine types. The N1 machines are available on Skylake, Broadwell, Haswell, Sandy Bridge, and Ivy Bridge CPU platforms. N1 machine types provide the following benefits:

These potentially offer wide performance differences under some workloads. In cases where we are monitoring ES, a single node can inhibit cluster performance.

@ebadyano

exekias commented 3 years ago

Thank you for the explanations on why this is important.

I'm wondering, would it be enough if we report total CPU MHz? I guess the current CPU usage in MHz can be calculated based on existing cpu usage metrics (%)?

@fearful-symmetry any thoughts on this one?

fearful-symmetry commented 3 years ago

I think we had this conversation in the past, and it didn't happen because it wasn't metrics-y enough. That being said, I think stuff like MHz information is a good potential addition, maybe as an enhancement to system/cpu or system/core.

sorantis commented 3 years ago

@fearful-symmetry in addition to MHz information it looks like @gingerwizard is looking for these specific fields:

model           : 85
model name      : Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

can we also add the to the system/cpu or system/core?

fearful-symmetry commented 3 years ago

@sorantis I think it would make sense to add that as well. For the sake of keeping track, so far we want to add the following fields to system/cpu :

elasticmachine commented 2 years ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

yuvielastic commented 2 years ago

@jlind23 @fearful-symmetry @sorantis I wanted to bump the urgency of this information to be available into metricbeat.

We see variances across models of CPU in cloud providers such as N2 with GCP and Edsv4/Ddv4 etc with Azure. As within same instance generation, we get variable performance as upon provisioning capacity we are allocated instance with a random CPU generation, we have started to hear some concerns around predictability as customers are getting wide performance differences in two clusters. Therefore, this information will help us monitor CPU variances across the hosts to identify the magnitude of these issues as we scale our infrastructure on Cloud.

Please let me know when can we expect cpu info to be added in the metricbeat.

belimawr commented 2 years ago

Re-opening because it is only implemented on Linux.

botelastic[bot] commented 3 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!