Closed olegsu closed 3 weeks ago
Pinging @elastic/elastic-agent (Team:Elastic-Agent)
Update from Sep 26, discussion about this proposal Participants @cmacknz, @andrewkroh, @aleksmaus and @olegsu
Action Item The Cloud Security team will run POC to understand the feasibility and complexity of delivering this by the 8.17 release. The POC will focus on HTTP JSON-based integration where the state object is mostly a timestamp.
Concern that was raised
2. Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.
Effectively the state in this case is a snapshot of all the data fetched and some state values, has to be fetched and updated "atomically".
The similar approach with the state is used, as far as I see in the filebeat, for other "entity analytics" inputs: active directory, azuread, jamf, in addition to okta.
The POC is in review https://github.com/elastic/security-team/issues/10714
Background
Currently, the Beats framework uses a state store that is based on the filesystem (libbeat/statestore). There are additional implementations, such as entityanalytics/kvstore and cursor.StateStore. This state store is used by Filebeat to ensure data is not ingested twice, which is critical for accurate data ingestion and processing.
Until now, the Elastic Agent has relied on persistent storage in two main environments:
What is Agentless Data Ingestion?
Agentless data ingestion allows users to collect data from cloud services, SaaS applications, and public APIs without needing to install or maintain agents. This approach reduces the complexity and overhead involved in managing agents, including version updates and continuous monitoring, and also eliminates the need for additional payments for agent-based operations.
By removing the need for Elastic Agent, users benefit from easier data ingestion while reducing the operational burden.
The challange in agentless
For agentless deployments, particularly on serverless platforms and ESS, running Elastic Agent on Kubernetes is necessary. However, using a DaemonSet or StatefulSet is not feasible in this environment. Instead, Elastic Agent is run as a Kubernetes Deployment.
Initially, we considered mounting a persistent volume (NFS) to the Elastic Agent deployment. However, this approach has limitations, especially regarding the number of volumes that can be attached to a single node (39 volumes on EKS). The approach focusing on security and workload isolation,requires that each agent policy runs a one integration, increasing the need for a non-filesystem-based persistent layer.
Use case
Many of the integrations maintained by the Security Integration team depend on state management for optimal performance. State is essential to avoid the re-ingestion of already processed data, which would negatively impact customer billing by processing duplicates.
For example, an integration fetching data from a cloud API needs to store a cursor or checkpoint to know which data has already been ingested. Without this state, the integration risks retrieving and processing the same data repeatedly. This sheet outlines candidate integrations for running agentlessly, most if then requires state to function efficiently.
Proposal
We propose implementing a state store backed by Elasticsearch. Having additonal (and unified statestore) has been discussed in https://github.com/elastic/beats/issues/40748. In addition, Elasticsearch-Connector already uses the upstream ES to store configuration and state. By implementing Elasticsearch for the
backend/statestore
interface, we can unblock the release of more integrations and enhance the agentless experience.References
Inform