elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.06k stars 4.89k forks source link

Investigate disable event normalization processing in Cloudfoundry input #34407

Open MichaelKatsoulis opened 1 year ago

MichaelKatsoulis commented 1 year ago

In this issue we will evaluate the benefits in cpu utilisation of filebeat cloud foundry input if we disable event normalization to reduce allocations when processing events. This was made possible due to this PR.

The tests are performed with latest Elasticsearch 8.6 deployed on Elastic Cloud. The https://apps.sys.giz-2.cf-obs.elastic.dev PCF cluster will be used and traffic will be generated using a log-generator app, each instance of which generates 100K events per minute. There will be a comparison of cpu usage between a filebeat that disables event normalization and the default filebeat.

MichaelKatsoulis commented 1 year ago

Filebeat in all cases is deployed as an app on PCF with Memory Allocated 1 GB. Due to each Diego cell capacity of 64GB ram and 16 cpus, this translates to Filebeat entitlement of (1/64)*16 = 0.25 cpus per instance.

A is default filebeat and B filebeat with normalization disabled. In each case filebeat run for 30 minutes and we collected the average cpu utilisation of all instances during that period.

Test 1 900K events per minute and 8 filebeat instances deployed. A. cf cpu shows average 148% of entitlement. This translates to 0,37 cpus B. cf cpu shows average 137,5% of entitlement. This translates to 0,34 cpus

Verdict: Filebeat with normalization disabled utilises ~8% less cpu

Test 2 1M events per minute and 8 filebeat instances deployed. A. cf cpu shows average 162% of entitlement. This translates to 0,405 cpus B. cf cpu shows average 146% of entitlement. This translates to 0,365 cpus

Verdict: Filebeat with normalization disabled utilises ~10% less cpu

I was unable to add more traffic to the cluster. Maybe I hit a bottleneck at Elasticsearch. But the sample is good and shows decrease in cpu utilisation.

Next step: Investigate how events stored in Elasticsearch are affected by disabling normalization.

cc @gizas