bug: custom multilineparser doesnot concatenate

Describe the issue

With 2.8.0 MultilineParser support was added, and finally got controller bug fixed with release of 3.0.0.

This is not currently working as expected.

To Reproduce

apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: multiline-tail
spec:
  tail:
    path: /var/log/containers/spring-test*.log
    multilineParser: combined-multiline
    readFromHead: true
    tag: kube.*

---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: MultilineParser
metadata:
  name: combined-multiline
  namespace: test
  labels:
    fluentbit.fluent.io/enabled: "true"
spec:
  type: regex
  flushTimeout: 1000
  rules:
    - start: "start_state"
      regex: '/^(?<time>\d{4}-\d{1,2}-\d{1,2}.*\d{1,2}:\d{1,2}:\d{1,2})(?<message>.*)/'
      next: "empty_line"
    - start: "empty_line"
      regex: '/^$/'
      next: "cont"
    - start: "cont"
      regex: '/com.mongodb.MongoSocketOpenException: Exception opening socket/'
      next: "cont"
    - start: "cont"
      regex: '/^\s+at\s.*/'
      next: "cont"
    - start: "cont"
      regex: '/^Caused by: /'
      next: "cont"
    - start: "cont"
      regex: '/... 3 common frames omitted/'
      next: "cont"

Sample log

Results: single line log in output.

Expected behavior

fluent-bit.conf

[Service]
    Http_Server    true
    Parsers_File    /fluent-bit/etc/parsers.conf
    Parsers_File    /fluent-bit/config/parsers_multiline.conf

[INPUT]
    name             tail
    path             /fluent-bit/logs/sample.log
    read_from_head   true
    multiline.parser combined-multiline

[OUTPUT]
    name             stdout
    match            *

[MULTILINE_PARSER]
    Name    combined-multiline
    Type    regex
    Flush_Timeout    1000
    Rule    "start_state" "/^(?<time>\d{4}-\d{1,2}-\d{1,2}.*\d{1,2}:\d{1,2}:\d{1,2})(?<message>.*)/" "empty_line"
    Rule    "empty_line" "/^$/" "cont"
    Rule    "cont" "/com.mongodb.MongoSocketOpenException: Exception opening socket/" "cont"
    Rule    "cont" "/^\s+at\s.*/" "cont"
    Rule    "cont" "/^Caused by: /" "cont"
    Rule    "cont" "/... 3 common frames omitted/" "cont"

sample.log

2024-08-15 13:57:55.741  INFO 1 --- [localhost:27017] org.mongodb.driver.cluster               : Exception in monitor thread while connecting to server localhost:27017

com.mongodb.MongoSocketOpenException: Exception opening socket
    at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:67) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:126) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:117) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
    at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na]
    at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na]
    at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na]
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na]
    at java.base/java.net.Socket.connect(Socket.java:591) ~[na:na]
    at com.mongodb.internal.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:64) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:62) ~[mongodb-driver-core-3.8.2.jar!/:na]
    ... 3 common frames omitted

2024-08-15 13:57:56.627  INFO 1 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'

This settings works as expected while testing with fluent-bit

[0] tail.0: [[1723748521.336166386, {}], {"log"=>"2024-08-15 13:57:55.741  INFO 1 --- [localhost:27017] org.mongodb.driver.cluster               : Exception in monitor thread while connecting to server localhost:27017

com.mongodb.MongoSocketOpenException: Exception opening socket
    at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:67) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:126) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:117) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
    at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na]
    at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na]
    at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na]
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na]
    at java.base/java.net.Socket.connect(Socket.java:591) ~[na:na]
    at com.mongodb.internal.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:64) ~[mongodb-driver-core-3.8.2.jar!/:na]
    at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:62) ~[mongodb-driver-core-3.8.2.jar!/:na]
    ... 3 common frames omitted
"}]
[0] tail.0: [[1723748521.339177219, {}], {"log"=>"2024-08-15 13:57:56.627  INFO 1 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'}]

Results were concatenated properly Expected same from fluent-operator settings

Your Environment

- Fluent Operator version: 3.0.0
- Container Runtime: containerd
- Operating system: AmazonLinux
- Kernel version: 5.10.219-208.866.amzn2.x86_64

How did you install fluent operator?

helm

Additional context

No response

Ok it was related to the resource type and my expectation; But raises question why is there no CRD for inputs? https://github.com/fluent/fluent-operator/tree/master/charts/fluent-operator/charts/fluent-bit-crds/crds

Above issue was caused due to another cluster level Input invalidating it.

I think this is bit flawed;

I cannot create inputs (i.e namespace), and MultilineParser is not recognized as parser if you want to use it as multiline Filter.

So that makes it only possible to use it as multilineParser argument with ClusterInputs and for that to be used, you need to create ClusterMultilineParser with ClusterInputs otherwise it wont see namespace uid prefix resource.

Or i could (is possible) missing something here.

ClusterInput & ClusterMultilineParser is not ideal for non privileged users (or namespace scoped users)

@elsnepal , According to the fluentbit documentation, MultilineParser can be referred in a filter parser.

Parser | Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line).

https://docs.fluentbit.io/manual/pipeline/filters/parser

@elsnepal , According to the fluentbit documentation, MultilineParser can be referred in a filter parser.

Parser | Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line).

https://docs.fluentbit.io/manual/pipeline/filters/parser

Both filter -> parser & filter -> Multiline doesnot seem to work.

│   [Filter]                                                                                                                                                                                                                                                                                                                                         
│       Name    parser                                                                                                                                                                                                                                                                                                                               
│       Match    kube.var.log.containers.spring**.log                                                                                                                                                                                                                                                                                                
│       Key_Name    message                                                                                                                                                                                                                                                                                                                          
│       Parser    combined-multiline                                                                                                                                                                                                                                                                                                                 
│       Preserve_Key    false                                                                                                                                                                                                                                                                                                                        
│       Reserve_Data    true

│ parsers_multiline.conf: |                                                                                                                                                                                                                                                                                                                          
│   [MULTILINE_PARSER]                                                                                                                                                                                                                                                                                                                               
│       Name    combined-multiline                                                                                                                                                                                                                                                                                                                   
│       Type    regex                                                                                                                                                                                                                                                                                                                                
│       Key_Content    message                                                                                                                                                                                                                                                                                                                       
│       Flush_Timeout    1000                                                                                                                                                                                                                                                                                                                        
│       Rule    "start_state" "/^(?<time>\d{4}-\d{1,2}-\d{1,2}.*\d{1,2}:\d{1,2}:\d{1,2})(?<message>.*)$/" "empty-line"                                                                                                                                                                                                                               
│       Rule    "empty-line" "/^$/" "com"                                                                                                                                                                                                                                                                                                            
│       Rule    "com" "/^(?<message>com.mongodb.*)$/" "cont"                                                                                                                                                                                                                                                                                         
│       Rule    "cont" "/^(?<message>\s+at.*)$/" "caused"

Or with

  filters:
  - multiline:
      buffer: false
      emitterMemBufLimit: 10
      emitterType: memory
      flushMs: 2000
      keyContent: message
      parser: combined-multiline
  match: kube.var.log.containers.spring**.log

Results to:

[2024/08/17 13:16:36] [error] [filter:parser:parser.0] requested parser 'combined-multiline' not found                                                                                                                                                                                                                                             [2024/08/17 13:16:36] [error] [filter:parser:parser.0] Invalid 'parser'                                                                                                                                                                                                                                                                            [2024/08/17 13:16:36] [error] Failed initialize filter parser.0

And another noticeable issue here is: MultilineParser & ClusterMultilineParser results as:

  │   [MULTILINE_PARSER]                                                                                                                                                                                                                                                                                                                               
│       Name    combined-multiline-namespaced                                                                                                                                                                                                                                                                                                                  
│       Type    regex                                                                                                                                                                                                                                                                                                                                
│       Key_Content    message                                                                                                                                                                                                                                                                                                                       
│       Flush_Timeout    1000                                                                                                                                                                                                                                                                                                                        
.......
                                                                                                                                                                                                                                                                            │
│   [MULTILINE_PARSER]                                                                                                                                                                                                                                                                                                                               
│       Name    combined-multiline                                                                                                                                                                                                                                                                                                                   
│       Type    regex                                                                                                                                                                                                                                                                                                                                
│       Key_Content    message                                                                                                                                                                                                                                                                                                                       
│       Flush_Timeout    1000

MultilineParser should be namespaced with uid like with Parser.

@elsnepal , According to the fluentbit documentation, MultilineParser can be referred in a filter parser.

Parser | Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line).

https://docs.fluentbit.io/manual/pipeline/filters/parser

Also that message is not saying "Multiline" as such; but Multiple Parser yes multiple parser works but not multiline parser

@elsnepal , According to the fluentbit documentation, MultilineParser can be referred in a filter parser.

Parser | Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line).

https://docs.fluentbit.io/manual/pipeline/filters/parser

Also that message is not saying "Multiline" as such; but Multiple Parser yes multiple parser works but not multiline parser

Yes, my bad, i miss-read that sentence.

@elsnepal I made some modifications and it can render the configurations as follows: you have to declare namespaceFluentBitCfgSelector in your FluentBit and have a manifest FluentBitConfig

---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: FluentBit
metadata:
  name: fluentbit-sample
  namespace: fluent
  labels:
    app: fluent-bit
spec:
  image: ghcr.io/fluent/fluent-operator/fluent-bit:v3.1.5-debug
  imagePullPolicy: IfNotPresent
  positionDB:
    hostPath:
      path: /var/lib/fluent-bit/
  fluentBitConfigName: fluentbitconfig-sample
  namespaceFluentBitCfgSelector:
    matchExpressions:
      - key: fluentbit.fluent.io/enabled
        operator: In
        values: ["true"]
  disableLogVolumes: false
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterFluentBitConfig
metadata:
  name: fluentbitconfig-sample
  namespace: fluent
spec:
  service:
    daemon: false
    logLevel: info
    parsersFile: parsers.conf
    parsersFiles: 
      - /fluent-bit/config/parsers_multiline.conf
  inputSelector:
    matchExpressions:
      - key: fluentbit.fluent.io/enabled
        operator: In
        values: ["true"]
  filterSelector:
    matchExpressions:
      - key: fluentbit.fluent.io/enabled
        operator: In
        values: ["true"]
  outputSelector:
    matchExpressions:
      - key: fluentbit.fluent.io/enabled
        operator: In
        values: ["true"]
  multilineParserSelector:
    matchExpressions:
      - key: fluentbit.fluent.io/enabled
        operator: In
        values: ["true"]
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: FluentBitConfig
metadata:
  labels:
    fluentbit.fluent.io/enabled: "true"
  name: fluentbitconfig-sample-namespace-test
  namespace: test
spec:
  service:
    daemon: false
    logLevel: info
  multilineParserSelector:
    matchExpressions:
      - key: fluentbit.fluent.io/enabled
        operator: In
        values: ["true"]
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: multiline-tail
spec:
  tail:
    path: /var/log/containers/spring-test*.log
    multilineParser: combined-multiline
    readFromHead: true
    tag: kube.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: MultilineParser
metadata:
  name: combined-multiline
  namespace: test
  labels:
    fluentbit.fluent.io/enabled: "true"
spec:
  type: regex
  flushTimeout: 1000
  rules:
    - start: "start_state"
      regex: '/^(?<time>\d{4}-\d{1,2}-\d{1,2}.*\d{1,2}:\d{1,2}:\d{1,2})(?<message>.*)/'
      next: "empty_line"
    - start: "empty_line"
      regex: '/^$/'
      next: "cont"
    - start: "cont"
      regex: '/com.mongodb.MongoSocketOpenException: Exception opening socket/'
      next: "cont"
    - start: "cont"
      regex: '/^\s+at\s.*/'
      next: "cont"
    - start: "cont"
      regex: '/^Caused by: /'
      next: "cont"
    - start: "cont"
      regex: '/... 3 common frames omitted/'
      next: "cont"
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
  name: output-sample
  namespace: fluent
  labels:
    fluentbit.fluent.io/enabled: "true"
spec:
  match: kube.*
  stdout: {}

fluent-bit.conf: |
  [Service]
      Daemon    false
      Log_Level    info
      Parsers_File    /fluent-bit/etc/parsers.conf
      Parsers_File    /fluent-bit/config/parsers_multiline.conf
  [Input]
      Name    tail
      Path    /var/log/containers/spring-test*.log
      Read_from_Head    true
      Tag    kube.*
      multiline.parser    combined-multiline
  [Filter]
      Name    rewrite_tag
      Match    kube.*
      Rule    $kubernetes['namespace_name'] ^(test)$ 098f6bcd4621d373cade4e832627b4f6.$TAG false
      Emitter_Name    re_emitted_098f6bcd4621d373cade4e832627b4f6
  [Output]
      Name    stdout
      Match    kube.*
parsers.conf: ""
parsers_multiline.conf: |
  [MULTILINE_PARSER]
      Name    combined-multiline
      Type    regex
      Flush_Timeout    1000
      Rule    "start_state" "/^(?<time>\d{4}-\d{1,2}-\d{1,2}.*\d{1,2}:\d{1,2}:\d{1,2})(?<message>.*)/" "empty_line"
      Rule    "empty_line" "/^$/" "cont"
      Rule    "cont" "/com.mongodb.MongoSocketOpenException: Exception opening socket/" "cont"
      Rule    "cont" "/^\s+at\s.*/" "cont"
      Rule    "cont" "/^Caused by: /" "cont"
      Rule    "cont" "/... 3 common frames omitted/" "cont"

Thank you @cw-Guo; those settings were in place, but i doubt CRD's didnt get updated down the line;

After doing clean install combined-multiline works

apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: multiline-tail
spec:
  tail:
    path: /var/log/containers/spring-test*.log
    multilineParser: combined-multiline
    readFromHead: true
    tag: kube.*

This makes the configuration to see multilineParser but i still think there are issues with this setup; having ClusterInput and namespace level MULTILINE_PARSER which eventually becomes clusterwide parser is bit of hack.

Should this be Input/Filter or Parser which should be using MultilineParser otherwise namespace operator will have to create ClusterInput which is clusterwide resource?

Also to make it difficult on parser, ClusterInput being bad is due to those cri exclusion means its not quite matching pattner and you will have to cater for those timestamp/stream & logtag per line

"message"=>"2024-08-20T08:41:26.722313966Z stdout F com.mongodb.MongoSocketOpenException: Exception opening socket"
"message"=>"2024-08-20T08:41:26.722319616Z stdout F     at com.mongodb.internal.connection.SocketStream.open

Resolution here would be to allow multiline filter type to make use of multilineParser:

[FILTER]
    Name multiline
    Match *
    multiline.key_content log
    multiline.parser supertest

[MULTILINE_PARSER]
    name          supertest
    type          regex
    flush_timeout 500
    #
    # Regex rules for multiline parsing
    # ---------------------------------
    #
    # configuration hints:
    #
    #  - first state always has the name: start_state
    #  - every field in the rule must be inside double quotes
    #
    # rules |   state name    | regex pattern                             | next state
    # ------|-----------------|--------------------------------------------------------
    rule      "start_state"      "/^(\d+-\d+-\d+ \d+:\d+:\d+\.\d+)(.*)$/"  "empty_row"
    rule      "empty_row"        "/^$/"                                    "error_row"
    rule      "error_row"        "/^.*$/"                                  "stacktrace"
    rule      "stacktrace"       "/^(\s*at .*|)$/"                            "stacktrace"

This is how it works with fluent-bit; https://github.com/fluent/fluent-bit/discussions/5430

This works as expected, issue was around CRDs update.

fluent / fluent-operator