fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.86k stars 1.59k forks source link

flush_timeout of multiline parser does not reset the state machine #9109

Open pmeier opened 4 months ago

pmeier commented 4 months ago

Bug Report

To Reproduce

We are running a configuration where we are looking for a trigger (bar) in the log message and keep appending messages until we see the trigger again.

fluent-bit.conf

[SERVICE]
    flush         1
    log_level     error
    parsers_file  parsers.conf

[INPUT]
    Name              tail
    Path              test.log
    Read_from_Head    True
    multiline.parser  bar

[OUTPUT]
    Name    stdout
    Match   *

parsers.conf

[MULTILINE_PARSER]
    name           bar
    type           regex
    flush_timeout  1000
    # rules |   state name  | regex pattern     | next state
    # ------|---------------|-------------------|-----------
    rule      "start_state"   "/.*?bar.*/"        "cont"
    rule      "cont"          "/^(?!.*bar).*$/"   "cont"

test.log

foo 0
bar 1
foo 2
foo 3

Run

$ fluent-bit -c fluent-bit.conf &; sleep 3 && \
  echo -e "foo 4\nfoo 5" >> test.log && sleep 3 && \
  echo -e "foo 6\nbar 7" >> test.log && sleep 3 && \
  echo -e "foo 8\nfoo 9" >> test.log && sleep 3 && \
  kill $!
[...]
[0] tail.0: [[1721338326.383185852, {}], {"log"=>"foo 0
"}]
[0] tail.0: [[1721338326.383223193, {}], {"log"=>"bar 1
foo 2
foo 3
"}]
[0] tail.0: [[1721338326.383223193, {}], {"log"=>"foo 4
foo 5
"}]
[0] tail.0: [[1721338326.383223193, {}], {"log"=>"foo 6
"}]
[1] tail.0: [[1721338332.379612465, {}], {"log"=>"bar 7
"}]
[0] tail.0: [[1721338332.379612465, {}], {"log"=>"foo 8
foo 9
"}]
[...]

Grouping foo 2 and foo 3 with bar 1 is correct. But after the flush timeout triggered we are still stuck in the state machine. So when we append foo 4 and foo 5 to the file, they are merged together although parser never saw the bar trigger again. foo 6 is the correctly put as single record, because afterwards we have the bar 7 trigger. But afterwards foo 8 and foo 9 are again erroneously merged together.

Expected behavior

foo 4 and foo 5 as well as foo 8 and foo 9 should be single lines and not be merged.

Your Environment

Additional context

As the whole point of the multiline parser is to merge multiple records, I don't think it makes any sense to not reset the state machine as soon as something is emitted by the flush_timeout.

pmeier commented 4 months ago

I'm certainly no expert on the code base, but IMO the reset needs to happen in

https://github.com/fluent/fluent-bit/blob/574a69af744535b6e016965f02eef9f739a5df1e/src/multiline/flb_ml.c#L1356-L1359

and according to this comment

https://github.com/fluent/fluent-bit/blob/574a69af744535b6e016965f02eef9f739a5df1e/src/multiline/flb_ml.c#L1469-L1474

probably has to be guarded with if (forced_flush) {...}.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

pmeier commented 1 month ago

Still relevant.