Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.37k stars 1.06k forks source link

clone_message creates endless loop #5759

Closed stockf closed 1 year ago

stockf commented 5 years ago

In Graylog 3.0 using a rule that contains the clone_message() function creates an endless loop like described in #4722.

5284 is related but I believe that this is a bug in the code not in the documentation.

Expected Behavior

The debug output ist displayed once -> The rule is evaluated just once

Current Behavior

The debug output gets repeated over and over until Graylog becomes completely unusable

Possible Solution

It seems that clone_message() makes the cloned message immediately enter the same stream again and is therefore evaluated in an infinity loop.

Steps to Reproduce (for bugs)

  1. Create the following rule:
    rule "clone"
    when
    true
    then
    debug("cloning message");
    clone_message();
    end
  2. Connect it to a stream over a pipeline
  3. Then send a message to that stream or use the Simulator.

Context

We are trying to forward a long message to a different stream, then shorten the message field to display in the same stream. Since we want the shortened field only in the original message and not in the one forwarded, we forward a cloned message.

Your Environment

kroepke commented 5 years ago

Currently the interpreter will create an exact copy of the message, but that message will receive a new ID. That means that from the interpreters view that message has not been processed on the streams it is routed to yet, so it will flow through the pipeline as newly seen message. The intention for clone_message is that the rule then proceeds to change the clone message, leading to another path through the pipelines.

sonnylaskar commented 5 years ago

What is the solution to this issue? What workaround can be done to send a message from one stream to another via pipeline?

stockf commented 5 years ago

As far as I remember, one workaround is to add a field e.g. "do_not_process" before calling clone_message and adding an if condition at the beginning that checks for the field. Like that (untested):

rule "clone only once"
when
    $message.do_not_process == false
then
    set_field("do_not_process", true);
    debug("cloning message");
    clone_message();
end

We endet up not using clone_message at all.

sonnylaskar commented 5 years ago

That didn't work. On adding clone_message, the throughput on that rule increased exponentially leading to 100% heap size usage and Graylog crashed.

@graylog2 Team, this seems to be a serious bug.

stockf commented 5 years ago

Maybe you have to use has_field. I really haven't written a rule in months. Try to test it without actually cloning the message and use the simulator to debug. The bug has existed for more than a year, I doubt that the maintainers will fix this now.

sonnylaskar commented 5 years ago

I have the below rule connected to Stream1 and writing the message to Stream2:

rule "sampleRule"
when
    to_string($message.field) == "somevalue"
then
    let msg = clone_message();
    route_to_stream("Stream2", "", msg);
end

Is there anything wrong that you see?

stockf commented 5 years ago

The moment you call clone_message() it will put the cloned message through the same pipeline. So you need to alter the message first. Maybe like this:

rule "sampleRule"
when
    NOT has_field("cloned_message")
then
    set_field("cloned_message", true)
    let msg = clone_message();
    remove_field(msg, "cloned_message")
    route_to_stream("Stream2", "", msg);
end

Still guessing, though.

sonnylaskar commented 5 years ago

I tried your rule. But the same issue occurred.

moesterheld commented 2 years ago

To clarify the issue and provide potential solutions:

clone_message will always add the cloned message to the original message's streams. This can potentially cause an endless loop if this is not taken into account in the rules.

Workarounds

Here are two possible, verified workarounds:

  1. remove from original stream, route to new stream
    rule "cloneMessage"
    when
    true
    then
    let newmsg = clone_message();
    remove_from_stream(name: "Original Stream", message: newmsg);
    route_to_stream(name: "New Stream", message: newmsg);
    end
  2. add identifier to cloned message
    rule "cloneMessage"
    when
    not has_field("cloned")
    then
    let newmsg = clone_message();
    set_field(field: "cloned", value: "true", message: newmsg);
    end

Making clone_message more intuitive

The fact that messages are added to the original message's streams is not documented and unfortunate since this will easily trigger the issue.

One possibility would be to change clone_message to not copy the original message's streams. To keep backwards compatibility, this behavior probably shouldn't be changed to not break existing rules.

Next to more comprehensive documentation, an additional parameter to the function, either as a boolean addToOriginalStreams, defaulting to true or a string streams depicting a comma-separated list of streams to add the cloned message to would make the functionality clearer.

Avoiding endless loops

Avoiding endless loops is a bit more tricky, but something which should be addressed to avoid Graylog from becoming unresponsive. Endless loops can either be triggered by clone_message (relatively easy) or by create_message (by accident) if the created or cloned message is sent back into the original pipeline without treating them differently from original messages. There are a couple of ways to address this issue:

1. Not processing generated messages in the rules which created them

This could be handled by adding context information to the generated messages and evaluating that in the processor. One drawback is that this will potentially break existing rules which have implemented e.g. one of the workarounds above.

2. Break the loop after a fixed number of generation cycles

If during pipeline execution, messages are generated, they will be sent into the processing pipeline after the original messages' execution. This can be easily detected and a counter could be used to stop the processing of generated messages after a certain number of "generation cycles". One problem here is setting a sensible initial number for this value. In theory, rules could be used to e.g. recursively generate messages. In this case the number would have to be relatively high. In most cases, however, a low number would probably suffice. Of course, this number could also be made configurable, but a default value for existing installations needs to be found. Another topic to be discussed is what to do with the generated messages in case of breaking the loop. Should these be stored or should a processing error be thrown and all messages (or all generated messages except the original) be rejected. How should the user be informed? Using a system notification or a simple log warning? A system notification should then only be shown for the first message to cause the issue until it is cleared by the user, if possible.

Already discussed these possibilities with @thll , but it would be great if we could decide on which approach we would like to take.