Open ruflin opened 9 months ago
I had a good conversation with @P1llus about this issue. Currently many integrations use event.original
as the source for all processing and not the message field. To ensure after the processing the event.original
field does not stick around and uses up lots of storage, the final pipeline has a remove process that checks for the tag preserve_original_event
and removes the event.original
if not set.
The above could mean, that there might be 2 config options needed:
I'm challenging if event.original
should be used as the source for processing instead of messaging but it seems it is currently the default in many integrations and as @P1llus mentioned, there are also advantages that it could be used for reindexing if needed and the same pipeline still works.
@P1llus In the scenarios where the original event is not in message
, how do integrations handle this at the moment? Pick the field where it is and put it into event.original
?
Pinging @elastic/es-data-management (Team:Data Management)
event.original
is an ECS field that can be useful in many scenarios, especially in the security context. Currently many integrations add it as part of their ingest pipeline. In Fleet, there is also the option to opt into having the field but it needs to be part of each integration. For more details on this see https://github.com/elastic/integrations/issues/4733There are several problems with the current approach:
Instead of having to repeat the same logic in many places, I propose to add a setting to data streams if the field should be added or not, something like:
This means not the integration decides if event.original is captured, but it is set on the data stream. Many integrations can be used for observability or security. If the use case is security, the setting
event.original
can be turned on for all dataset without having to modify any integrations.In the scenario of where data is routed, this would also ensure
event.original
contains the data before it was routed in case on the data stream that triggers the routing,event.original: true
is set.Expected behaviour
The behaviour of the setting would be as follow:
event.original
does not exist, first action before applying any ingest pipeline,message
is copied toevent.original
event.orignal
already exist, nothing is doneChange in integrations
It seems at the moment in integrations as we add
event.original
manually (1, 2) the integrations rename the message to event.original and then all the processing happens on event.original. I'm proposing to change this to keep all the processing on message as now integrations would always assume event.original might not be around.Questions
event.original
work in combination with TSDB / synthetics source?Links