The documentation of the filter_multiline plugin highly recommends to use the multiline support of the input_tail plugin if used. This is done by using the multiline.parser option. Here is how this looks for a reduced version of the official example
configuration files
#### `fluent-bit.conf`
```
[SERVICE]
flush 1
log_level info
parsers_file parsers_multiline.conf
[INPUT]
name tail
path test.log
read_from_head true
multiline.parser multiline-regex-test
[OUTPUT]
name stdout
match *
```
#### `parsers_multiline.conf`
```
[MULTILINE_PARSER]
name multiline-regex-test
type regex
flush_timeout 1000
# Regex rules for multiline parsing
# ---------------------------------
#
# configuration hints:
#
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state name
# --------|----------------|--------------------------------------------------
rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont"
rule "cont" "/^\s+at.*/" "cont"
```
test.log
single line...
Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.main(MyProject.java:6)
another line...
output
[0] tail.0: [[1721287650.926202485, {}], {"log"=>"single line...
"}]
[0] tail.0: [[1721287650.926237461, {}], {"log"=>"Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.main(MyProject.java:6)
"}]
The multiline parsing works fine here, although the last log line (another line ...) is swallowed (see #8623). Plus I'm wondering why we have the same index (?), i.e. [0] for both records.
It breaks down if one needs to use a parser before the multiline parser is applied. Per documentation, this should be configured by the parser and key_content option on the multiline parser itself.
configuration files
#### `fluent-bit.conf`
```
[SERVICE]
flush 1
log_level info
parsers_file parsers_multiline.conf
[INPUT]
name tail
path test_docker.log
read_from_head true
multiline.parser multiline-regex-test
[OUTPUT]
name stdout
match *
```
#### `parsers_multiline.conf`
```
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[MULTILINE_PARSER]
name multiline-regex-test
type regex
flush_timeout 1000
parser docker
key_content log
# Regex rules for multiline parsing
# ---------------------------------
#
# configuration hints:
#
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state name
# --------|----------------|--------------------------------------------------
rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont"
rule "cont" "/^\s+at.*/" "cont"
```
I'm using the documented example for parsing docker logs and just wrapped the individual lines of test.log into the docker logs format:
[0] tail.0: [[1721288427.742631829, {}], {"log"=>"{"log": "single line...\n", "stream": "stdout", "time": "2024-07-17T14:24:00.962740Z"}"}]
[1] tail.0: [[1721226240.962777000, {}], {"log"=>"Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.main(MyProject.java:6)
", "stream"=>"stdout"}]
[2] tail.0: [[1721226240.962777000, {}], {"log"=>"{"log": "another line...", "stream": "stdout", "time": "2024-07-17T14:24:00.962825Z"}"}]
So the multiline parsing still works, but for some reason the single lines have the whole input record nested under the "log" key.
Curiously, if I just put the parser in the input_tail plugin and insert a filter_multiline plugin, everything works fine:
configuration files
#### `fluent-bit.conf`
```
[SERVICE]
flush 1
log_level info
parsers_file parsers_multiline.conf
[INPUT]
name tail
path test_docker.log
read_from_head true
parser docker
[FILTER]
name multiline
match *
multiline.key_content log
multiline.parser multiline-regex-test
[OUTPUT]
name stdout
match *
```
#### `parsers_multiline.conf`
```
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[MULTILINE_PARSER]
name multiline-regex-test
type regex
# Regex rules for multiline parsing
# ---------------------------------
#
# configuration hints:
#
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state name
# --------|----------------|--------------------------------------------------
rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont"
rule "cont" "/^\s+at.*/" "cont"
```
[0] tail.0: [[1721226240.962740000, {}], {"log"=>"single line...
", "stream"=>"stdout"}]
[1] tail.0: [[1721226240.962777000, {}], {"log"=>"Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.main(MyProject.java:6)
", "stream"=>"stdout"}]
[2] tail.0: [[1721226240.962825000, {}], {"log"=>"another line...", "stream"=>"stdout"}]
Expected behavior
Setting a multiline parser with parser on the input_tail plugin should work exactly as only setting a parser and using a filter_multiline plugin afterwards.
Your Environment
Version used: 3.0.6 / 3.1.0
Configuration: see above
Environment name and version (e.g. Kubernetes? What version?): Kubernetes
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
Bug Report
The documentation of the
filter_multiline
plugin highly recommends to use the multiline support of theinput_tail
plugin if used. This is done by using themultiline.parser
option. Here is how this looks for a reduced version of the official exampleconfiguration files
#### `fluent-bit.conf` ``` [SERVICE] flush 1 log_level info parsers_file parsers_multiline.conf [INPUT] name tail path test.log read_from_head true multiline.parser multiline-regex-test [OUTPUT] name stdout match * ``` #### `parsers_multiline.conf` ``` [MULTILINE_PARSER] name multiline-regex-test type regex flush_timeout 1000 # Regex rules for multiline parsing # --------------------------------- # # configuration hints: # # - first state always has the name: start_state # - every field in the rule must be inside double quotes # # rules | state name | regex pattern | next state name # --------|----------------|-------------------------------------------------- rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont" rule "cont" "/^\s+at.*/" "cont" ```
test.log
output
The multiline parsing works fine here, although the last log line (
another line ...
) is swallowed (see #8623). Plus I'm wondering why we have the same index (?), i.e.[0]
for both records.It breaks down if one needs to use a parser before the multiline parser is applied. Per documentation, this should be configured by the
parser
andkey_content
option on the multiline parser itself.configuration files
#### `fluent-bit.conf` ``` [SERVICE] flush 1 log_level info parsers_file parsers_multiline.conf [INPUT] name tail path test_docker.log read_from_head true multiline.parser multiline-regex-test [OUTPUT] name stdout match * ``` #### `parsers_multiline.conf` ``` [PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%LZ [MULTILINE_PARSER] name multiline-regex-test type regex flush_timeout 1000 parser docker key_content log # Regex rules for multiline parsing # --------------------------------- # # configuration hints: # # - first state always has the name: start_state # - every field in the rule must be inside double quotes # # rules | state name | regex pattern | next state name # --------|----------------|-------------------------------------------------- rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont" rule "cont" "/^\s+at.*/" "cont" ```
I'm using the documented example for parsing docker logs and just wrapped the individual lines of
test.log
into the docker logs format:test_docker.log
output
So the multiline parsing still works, but for some reason the single lines have the whole input record nested under the
"log"
key.Curiously, if I just put the parser in the
input_tail
plugin and insert afilter_multiline
plugin, everything works fine:configuration files
#### `fluent-bit.conf` ``` [SERVICE] flush 1 log_level info parsers_file parsers_multiline.conf [INPUT] name tail path test_docker.log read_from_head true parser docker [FILTER] name multiline match * multiline.key_content log multiline.parser multiline-regex-test [OUTPUT] name stdout match * ``` #### `parsers_multiline.conf` ``` [PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%LZ [MULTILINE_PARSER] name multiline-regex-test type regex # Regex rules for multiline parsing # --------------------------------- # # configuration hints: # # - first state always has the name: start_state # - every field in the rule must be inside double quotes # # rules | state name | regex pattern | next state name # --------|----------------|-------------------------------------------------- rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont" rule "cont" "/^\s+at.*/" "cont" ```
test_docker.log
output
Expected behavior
Setting a multiline parser with parser on the
input_tail
plugin should work exactly as only setting a parser and using afilter_multiline
plugin afterwards.Your Environment