Closed jamiehynds closed 11 months ago
Pinging @elastic/security-external-integrations (Team:Security-External Integrations)
One approach for doing the AID Master (host details) enrichment that I want to test is:
Create a separate input in the CrowdStrike FDR integration to read the AID Master data from S3. Allow usage of both SQS and S3 polling. Page 19 of this PDF indicates that their are separate SQS queues for AID Master and FDR data. https://www.crowdstrike.com/wp-content/uploads/2021/10/crowdstrike-falcon-data-replicator-fdr-sqs-technical-add-on-guide.pdf
With the Crowdstrike AID Master data being written to a Fleet managed data stream, apply a "latest" transform to create an entity-centric index with one doc per unique AID. Transforms are already support by Fleet packages.
\
Or if Fleet allows, route Crowdstrike AID Master data to its own (regular) index. Use the AID value as the _id
and update records. This avoids the need for a transform.
Periodically execute an enrich policy to update the enrich index based on data from the transform output. Fleet does not support enrich policies today, but if this approach works we can propose an enhancement. Fleet would need to periodically update the enrich index unless we had a feature like https://github.com/elastic/elasticsearch/pull/73407.
Add an enrich processor to the CrowdStrike pipeline to add AID Master data to FDR events an ingest time.
@r00tu53r Cribl recently released a Crowdstrike pack, which includes support for hostname enrichment. Going by their docs, Redis is required for enrichement. https://cribl.io/blog/cribl-pack-for-crowdstrike/#name
@jamiehynds thanks for sharing the link. Cribl indicates that it has a pack that enriches from AID data from Redis.
As Andrew suggested above - I could add a new input source to index AID data into a fleet managed data stream.
However, as mentioned in the linked issues we've hit a blocker when adding this capability to the integration due to limitations in Kibana.
I think there are still several issues that will make it difficult or non-optimal to build a solution using only Elasticsearch features exposed through Fleet integrations. The biggest being there is no way to coordinate the timing or ordering of data ingestion (aid master and fdr events), latest transform execution, and enrich policy updates.
My recommendation is to explore a beat based solution that can store the AID metadata and apply the enrichment.
The questions I have are about how often is AID Master data written and is it delivered as a complete or partial update. Is there a single file that contains the AID Master data and it is continuously re-written in S3?
I'm inclined to agree with this. The Elasticsearch approach depends on having multiple components tightly integrated without a system that is well suited to doing this to support it. In addition to this complication in the set-up, this is likely to bring with it difficulties in debugging if we have users with problems in the host enrichment process.
It's entirely unclear from this whether the AIDMaster write is complete or a periodic set of update events.
Depending on what approach they take for these updates I'm thinking either of a time to live heap cache of the metadata or a swap-the-world table of metadata. In the case of complete writes. Both approaches would work in either case, but the swap-the-world approach is simpler to implement and is all that is needed in the case of complete writes.
Given the potential delays in reading data from the buckets and how this would impact on the ability of the beat processor to enrich the events with metadata, I think that we should recommend that the agent that does this work should be colocated on AWS to reduce time costs for collecting the data.
I have looked over potential approaches to this and I think that a processor akin to the add_process_metadata
processor is probably the kind of thing we want here.
This processor has a backing cached table of metadata that it collects as needed and uses to decorate events as they pass through. The difference between that processor and what we want here is the cost of obtaining the data to fill the look-up table. This has flow-on effects on how the collector needs to work and advice that we give to users.
The design of the collector that I favour is one where periodically (triggered by the initial configuration of the processor and by calls to the processor to enrich an event, with a cool down period that can be configured) the collector will pull the most recent metadata from the remote store (s3 or sqs) running in a separate goroutine to prevent delaying beat pipeline throughput. On completing the collection the collector then atomically replaces the world with the fresh data. Again to prevent throughput stalling, if an enrichment is not available for an event, either a marker of missing data will be added or the event will be left unaltered.
We cannot use (safely) a long running goroutine to perform the collection, since processor do not get handed any context that might be used to allow them to be cancelled. This means that we need to go the route of triggering short lived goroutines via the calls to the enrichment action. I don't believe that this will have a significant impact on freshness of data except in cases where the pipeline is only sporadically active. This is still an open question though.
Another issue is the latency between invocation/starting a collection and the availability of the data for enrichment. I think that we should provide advice to users that the agent running this input/processor should be collocated on AWS hardware in the same zone as their s3/sqs store to minimise network time costs.
The documentation available from Crowdstrike does not make clear what the behaviour is with regard to host data event dumps into the bucket. Though it does feel like the approach is to at intervals dump the complete known state (we should try to find this out). If this is the case it does not seem to me like there is any real merit in treating s3 and sqs differently by the collector's provision to the processor; if all the details get dumped as a batch it does not make sense to trickle the events through to the processor rather than just making a swap-the-world change over.
The collector will need to be a new s3/sqs package as the current aws input is significantly more complex than is needed for this and has assumptions about being a managed input. With this, we can make the processor take an interface type that has methods to start collection and to swap-the-world, and so make the processor more general than just for AWS stores.
At this stage it looks like the configuration of the processor would include:
Completed by #8474
Crowdstrike Falcon Data Replicator (FDR) replicates log data from your CrowdStrike environment to an S3 bucket, to enable ingestion of log data for SIEMs and other security tools. While our FDR integration ingests this data, unfortunately, Crowdstrike does not include important information such as hostname or username as part of these events, rendering the events unusable without that context.
As an example, a
ProcessRollup
event combines data from several sources into one event which describes a process which is running or has previously run on the host.UserSID
field is included with theProcessRollup2
event, TheUserSid
andAuthenticationId
fields define the security context the process was created with. To determine details about this context, find aUserIdentity
event with the sameAgent ID
,UserSid
andAuthenticationId
. Looking at aUserSid
can tell you the user a process is running as, but without also looking at theAuthenticationId
you will not be able to determine the full security context information.For hostname/computername you can correlate the aid (agent id) with the aid_master file With FDR you also get in addition to the events listed in the Events Data Dictionary, Falcon Insight customers can optionally request these events: • aid_master (hosts) • managedassets • notmanaged
While we do not have an elegant solution to enrich these events today with hostname/username, this issue is intended to track our progress on researching possible solutions/workarounds.