elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.16k stars 4.91k forks source link

[Meta][GCS] - Improvements and addition of new features to the GCS input #41107

Open ShourieG opened 1 week ago

ShourieG commented 1 week ago

Describe the enhancement: This issue is to track various improvements and addition of new features to the GCS input.

Describe a specific use case for the enhancement or feature: With our GCS input being slowly adopted across various integrations, we need to make it more robust and optimal in terms of performance, failure tracking and scalability, hence this Meta issue is created to bring in the necessary changes overtime.

Improvements:

  1. Add metrics to the GCS input. Separate issue here
  2. Segregate batch_size from worker count (currently worker count is used as the batch_size to distribute jobs evenly)
  3. Segregate cursor save op from event publish and add support for detecting elasticsearch acknowledgement signal and use that to update cursor state.
  4. Improve documentation and explain the impact polling op has on scalability.

New Features

  1. Add support for filtering by prefix and glob expressions.
  2. Add support for state tracking via optional startOffset (user configurable with certain ordering limitations)
  3. Add support for GCS PubSub enabling horizontal scalability of the input.
  4. Add support for more content-types to the GCS input via content decoders.
elasticmachine commented 1 week ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

ShourieG commented 1 week ago

cc: @narph, @andrewkroh

ShourieG commented 1 week ago

@andrewkroh, @efd6 please feel free to expand this issue by suggesting improvements/additions that you would like to see in the input moving foreward.