fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.55k stars 1.52k forks source link

Fluentbit multiline parsing is not working as intended #8871

Open Asmokin11 opened 1 month ago

Asmokin11 commented 1 month ago

Bug Report

Describe the bug

I have implemented multiline logging in our GKE cluster and the log parsing is correct most of the times but every now and then approximately 4-5 times in 3 hours I see logs in Cloud Logging which are not parsed as a multiline log line. When the log line is not correctly parsed, any subsequent line coming after that are skipped until the parser matches the "start_state" again.

To Reproduce

Expected behavior

The above shared log gets parsed into 2 sections: 1st log section contains:

2024-05-24 14:43:06,804 [] [INFO] [org.<id>] [default task-60] - REQ_IN
    Address: <id>
    HttpMethod: GET
    ExchangeId: <id>

2nd log section:

Headers: {<text>

Screenshots

As seen in the attached screenshot which is for the last 3 hours, these occurrences are few but still present. On some occasions the log lines coming after the first orphaned log get skipped and the processing starts again from the next new log line. fluentbit-log

Your Environment

fluentbit.conf:

[SERVICE]
    Flush                     10
    Grace                     10
    Log_Level                 trace
    Log_File                  /var/log/fluentbit_custom.log
    storage.path              /var/log/flb-storage/
    storage.max_chunks_up     16384
    storage.backlog.mem_limit 10M
    storage.sync              normal
    Daemon                    off
    Parsers_File              parsers.conf
    HTTP_Server               On
    HTTP_Listen               0.0.0.0
    HTTP_Port                 12020

[INPUT]
    Name             tail
    Alias            kube_containers
    Tag              kube.*
    Path             /var/log/containers/*.log
    Exclude_Path     <path>
    DB               /var/run/flb_kube_custom.db
    Skip_Long_Lines  off
    Skip_Empty_Lines off
    Buffer_Max_Size  20MB
    Mem_Buf_Limit    50MB
    storage.type     filesystem
    Refresh_Interval 60
    DB.Sync          normal
    read_from_head   true
    parser           containerd_default 

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix     kube.var.log.containers.
    Use_Kubelet         true
    Buffer_Size         0

[FILTER]
    Name      lua
    Match     *
    script    filters.lua
    call      cb_copy_kubernetes_labels

[FILTER]
    name                  multiline
    Match_Regex           kube\.(.*)(routingservice|backoffice|isnew)(.*)
    multiline.key_content log
    multiline.parser      multiline_parser_1

[FILTER]
    name                  multiline
    match                 kube.*reporting*
    multiline.key_content log
    multiline.parser      multiline_parser_2

[FILTER]
    name       parser
    match      kube.*routingservice*
    key_name   log
    parser     parser_1

[FILTER]
    name       parser
    match      kube.*backoffice*
    key_name   log
    parser     parser_1

[FILTER]
    name       parser
    match      kube.*cryptoserver*
    key_name   log
    parser     parser_2

[FILTER]
    name       parser
    match      kube.*isnew*
    key_name   log
    parser     parser_1

[FILTER]
    name       parser
    match      kube.*gossip*
    key_name   log
    parser     parser_2

[FILTER]
    name       parser
    match      kube.*reporting*
    key_name   log
    parser     parser_3

[FILTER]
    name       parser
    match      kube.*external-dns*
    key_name   log
    parser     parser_4

[FILTER]
    name       parser
    match      kube.*
    key_name   log
    parser     default

[OUTPUT]
    Name                  stackdriver
    Match                 *
    severity_key          severity
    k8s_cluster_name      <cluster-id>
    k8s_cluster_location  <region>
    resource              k8s_container
    labels_key            labels
    tag_prefix            kube.var.log.containers.
    workers               0

parsers.conf:

[PARSER]
    Name      containerd_default
    Format    regex
    Regex     ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*)(?<log>.*)$

[PARSER]
    Name      default
    Format    regex
    Regex     ^(?<message>(.)*)$

[PARSER]
    Name      parser_1
    Format    regex
    Regex     /^\s?(?<time>([0-9, ,\-,\,,.,:]){19,23})[ ,|,;]*\[\]\s+\[(?<severity>([A-Z,a-z])*)\][ ,|,;]*\s+\[(?<source>([^\]]*))\](?<message>(.)*)/m

[PARSER]
    Name      parser_2
    Format    regex
    Regex     ^\s(?<time>([0-9, ,\-,\,,.,:]){19,23})[ ,|,;]*(?<severity>([A-Z,a-z])*)[ ,|,;]*(?<source>([A-Z,a-z,.,0-9,\_])*)[ ,|,;]*(?<message>(.)*)$

[PARSER]
    Name      parser_3
    Format    regex
    Regex     /\s*(?:\x1B\[0?m)?(?:\x1B\[\d+m)?(?<time>(\d{2}:\d{2}:\d{2},\d{3}))\s+(?<severity>(\w+))\s+\[(?<source>([^\]]*))\][ ,|,;]*(?<message>(.)*)/m

[PARSER]
    Name      parser_4
    Format    regex
    Regex     ^\stime="(?<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)"\slevel=(?<severity>\w+)\smsg="(?<message>(.)*)"$

[MULTILINE_PARSER]
    name          multiline_parser_1
    type          regex
    flush_timeout 20000
    #
    # rules |   state name  | regex pattern                                              | next state
    # ------|---------------|------------------------------------------------------------|------------
    rule      "start_state"   "^\s*\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}"                "cont"
    rule      "cont"          "^(?!\s*\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}).+$"         "cont"

[MULTILINE_PARSER]
    name          multiline_parser_2
    type          regex
    flush_timeout 20000
    #
    # rules |   state name  | regex pattern                                                                          | next state
    # ------|---------------|----------------------------------------------------------------------------------------|------------
    rule      "start_state"   "^\s*(?:\x1B\[0?m)?(?:\x1B\[\d+m)?(\d{2}:\d{2}:\d{2},\d{3})\s+(\w+)\s+\[([^\]]*)\]"      "cont"
    rule      "cont"          "^(\s+([^\[\d]*?)\s+.*)|^\s*$|^(?!.*\[\d+m[\d:m]+\,\d+).*$"                              "cont"

Additional context

The parsers on most occasions parse the logs correctly but we still cannot rely fully on them due to these log line skipping and improper parsing.

fluentbit.log:

[2024/05/24 12:43:49] [ info] [output:http:http.0] 127.0.0.1:2021, HTTP status=200
[2024/05/24 12:44:24] [ info] [output:http:http.0] 127.0.0.1:2021, HTTP status=200
[2024/05/24 12:44:54] [error] [http_client] broken connection to 127.0.0.1:2021 ?
[2024/05/24 12:44:54] [error] [output:http:http.0] could not flush records to 127.0.0.1:2021 (http_do=-1)
[2024/05/24 12:44:54] [ warn] [engine] failed to flush chunk '1-1716554691.613481186.flb', retry in 11 seconds: task_id=0, input=kube_containers_kube-system > output=http.0 (out_id=0)
Again:
[2024/05/24 12:54:54] [ info] [output:http:http.0] 127.0.0.1:2021, HTTP status=200
[2024/05/24 12:55:24] [error] [src/flb_http_client.c:1172 errno=32] Broken pipe
[2024/05/24 12:55:24] [error] [output:http:http.0] could not flush records to 127.0.0.1:2021 (http_do=-1)
[2024/05/24 12:55:24] [ warn] [engine] failed to flush chunk '1-1716555324.508689671.flb', retry in 8 seconds: task_id=0, input=kube_containers_kube-system > output=http.0 (out_id=0)
[2024/05/24 12:55:32] [ info] [output:http:http.0] 127.0.0.1:2021, HTTP status=200

fluentbit_custom.log:

[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [debug] [input:tail:kube_containers] [static files] processed 32.0K
[2024/05/24 12:43:19] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2024/05/24 12:43:19] [debug] [upstream] KA connection #101 to logging.googleapis.com:443 is now available
[2024/05/24 12:43:19] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.var.log.containers.*.log] is assigned for local_resource_id
[2024/05/24 12:43:19] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload
[2024/05/24 12:43:19] [debug] [upstream] KA connection #96 to logging.googleapis.com:443 has been assigned (recycled)
[2024/05/24 12:43:19] [debug] [http_client] not using http_proxy for header
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:434] using stream tail.0_kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log
[2024/05/24 12:43:19] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:177] emitting from tail.0 to kube.var.log.containers.*.log

Inside the fluentbit_custom.log file I can see the below line on many occassions:

[2024/05/24 12:43:25] [trace] [filter:multiline:multiline.2 at /src/fluent-bit/plugins/filter_multiline/ml.c:741] not processing records from the emitter
patrick-stephens commented 1 month ago

I got immediately suspicious when I saw a non-standard container log parser being used:

parser           containerd_default 

This won't cope with multiline or partial logs I think so are those reconstructed afterwards? I wonder if every now and then you get a different format in your actual container logs on the node - you're just showing the logs without the container format around it so I'm guessing application logs. Can you check the actual log files on disk and verify the formatting there?

Asmokin11 commented 1 month ago

Yes the logs I shared above are applications logs. So the actual logs on the disk are in below format:

2024-05-27T10:45:46.381141241Z stdout F 2024-05-27 12:45:46,380 [] [INFO] [org.<text>] [<text>] - RESP_IN
2024-05-27T10:45:46.381171606Z stdout F     Address: <addresss-id>
2024-05-27T10:45:46.381176803Z stdout F     Content-Type: application/json
2024-05-27T10:45:46.381181208Z stdout F     ResponseCode: 200
2024-05-27T10:45:46.381185641Z stdout F     ExchangeId: <exchange-id>
2024-05-27T10:45:46.381190101Z stdout F     Headers: {<text>}
2024-05-27T10:45:46.381194381Z stdout F     Payload: {
2024-05-27T10:45:46.381198647Z stdout F     "transactionStatus": "",
2024-05-27T10:45:46.381202901Z stdout F     "fundsAvailable": true
2024-05-27T10:45:46.381206856Z stdout F }
2024-05-27T10:45:46.381210366Z stdout F

This again is an example of a failed multiline log parsing which happened today. Everything from "Headers:" onwards is shown in a separate log line.

The idea behind

parser           containerd_default 

was to have the actual log entries separated from the prefix which is being added. Then parse the logs captured in the regex group "log".

Asmokin11 commented 1 month ago

Hi, would anyone be able to assist here ?