Open philippri opened 5 years ago
I think this is due to "the stream being a child of the message" in the data flow.
Messages are assigned to streams by a meta field containing the stream IDs in each message and Graylog simply applies a filter on these IDs if only a specific stream should be returned. This causes the updates made by any pipeline function, which are applied based on a message ID, to update the message content without regard to the different streams.
The correct way to decouple changes made by different streams/pipelines would be to use the clone_message([message: Message])
function to create a new message, resulting in the changes to be made on different messages for their respective streams.
Feel free to correct me if I'm wrong. But I think this is the issue at hand here.
Greetings, Philipp
I managed to implement different views of the same data using the mechanisms @DerPhlipsi suggested, thanks again for that. If interested, please have a look at https://community.graylog.org/t/anonymized-and-raw-views-of-same-logs-in-different-streams-possible/ for details. I am not closing this issue, though, as what @jalogisch told me over at the community page makes me think the behaviour described in the above issue is not intended.
This bug is still valid (with 2.4.6, and because the issue is still open, it might be valid in 2.5.x and 3.x).
Pipelines have global impact instead of stream specific impact.
I tried to create different views of the same log data by creating two streams assigned to two different index sets. When manipulating one of these streams using a processing pipeline, data in the other stream is being manipulated, too. The pipeline seems to ignore that it is connected to a single stream and processes all versions of a message in any available stream. This bug report was written as advised over at: https://community.graylog.org/t/anonymized-and-raw-views-of-same-logs-in-different-streams-possible/
Expected Behavior
I expected the processing pipeline to only affect the stream it is connected to, especially given a stream in a seperate index set.
Current Behavior
Instead of only manipulating the log data in the stream the pipeline is connected to, it affects all copies of the events in all index sets.
Steps to Reproduce
Context
I am trying to create two views of the log data to set up a system that is GDPR compliant. The views of the logs meant to be used day by day should be anonymized while the raw data is available separately if tracking down an attacker or similar measures are neccessary.
Your Environment
Message Processor Configuration: