Suggestion for a new processor: Replacer

Based on Issue #643 we agreed to implement a new processor to conduct replacements in logs.

Following requirements should be met, every listed point here is open for further discussion:

Pattern Matching: The replacer processor should support various pattern matching techniques to identify and replace specific patterns or strings in the logs. This often involves regular expressions or predefined string patterns; or fuzzy matching in some cases.

Customizable Replacement Rules: It should allow users to define and customize replacement rules. This could include simple string replacements, complex regex-based substitutions, or conditional replacements based on specific log content like severity e.g. Possible replacements with hashes are open for discussion. In case of random replacements or non deterministic ones, if they are even considered; a history for possibly needed rollbacks should be kept.

Rule Prioritization: If wildcard rules (like test.*) and specific rules (like test.subfield) exist, there should be clear prioritization. Typically, specific rules take precedence over wildcard rules.

Granularity in Replacement Operations: The processor should allow users to apply replacements with different granularities, like;

Field level: Apply to a single field like test.subfield.
Subtree level: Apply to an entire field hierarchy like test.*.
Global level: Apply across all fields if needed, for global transformations.

How about a simple solution for the first try:

event

{
"target": {
  "field": "This is the message and this should be replaced"
  }
}

rule format:

filter: message
replacer:
  mapping:
    target.field: "This is the message and %{ this is the replacement string }"

results in:

{
"target": {
  "field": "This is the message and this is the replacement string"
  }
}

I suggest to avoid all kinds of regex cause they are always slow. If we could do 80% of the problem with simple string operations, that should be the way to go.

if you want to replace the same string in every field, you can add the other fields with their replacements in the mapping field of the rule like

filter: message
replacer:
  mapping:
    target.field: "This is the message and %{ this is the replacement string }"
    other.field: "This is another field with another %{ the replacement string }"

The advantage of this solution is, that the interface for the user is already known, because it is like in the dissector. The interface of the processor aligns to to the field_manager, so it is easily implemented in logprep and in the supporting tool chain like the fda. Because the user should avoid editing yaml files for the configuration of rules, all necessary simplifications could then be done by the fda user interface and results in the corresponding rule in logprep. Things like "I want to replace the same string in all subfields" (But you have to know all subfields ;) )

But yes you have to add all fields and their replacements in the mapping if you want to write your yaml files by your own.

Let me know if you are fine with it.

additionally we should avoid to add rules working on all subfields of an event. Because this leads to a potential denial of service surface, if an attacker nests events infinite like deep. this will lead to an everlasting loop or recursion. in my opinion. Same for global replacements. Because nobody knows how much fields the event will have. This would also result in very poor performance because the traversing over all fields of a dict not knowing how much it is is real slow and already said dangerous

fkie-cad / Logprep

Suggestion for a new processor: Replacer #669