fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.86k stars 1.59k forks source link

Docker_mode to recombine multiline records in json-log from docker #1115

Closed epcim closed 4 years ago

epcim commented 5 years ago

Problem If the application in kubernetes logs multiline messages, docker split this message to multiple json-log messages.

The actual output from the application

[2019-02-15 10:36:31.224][38][debug][http] source/common/http/conn_manager_impl.cc:521] [C463][S12543431219240717937] request headers complete (end_stream=true):
':authority', 'customer1.demo1.acme.us'
':path', '/api/config/namespaces/test/routes'
':method', 'GET'
'user-agent', 'Go-http-client/1.1'
'cookie', 'X-ACME-GW-AUTH=eyJpc3N1ZWxxxxxxxx948b94'
'accept-encoding', 'gzip'
'connection', 'close'

Now this becomes in docker log, to be parsed by fluentbit in_tail: (example differs from the above)

{"log":"[2019-02-15 11:00:08.688][9][debug][router] source/common/router/router.cc:303] [C0][S14319188767040639561] router decoding headers:\n","stream":"stderr","time":"2019-02-15T11:00:08.688733409Z"}
{"log":"':method', 'POST'\n","stream":"stderr","time":"2019-02-15T11:00:08.688736209Z"}
{"log":"':path', '/envoy.api.v2.ClusterDiscoveryService/StreamClusters'\n","stream":"stderr","time":"2019-02-15T11:00:08.688757909Z"}
{"log":"':authority', 'xds_cluster'\n","stream":"stderr","time":"2019-02-15T11:00:08.688760809Z"}
{"log":"':scheme', 'http'\n","stream":"stderr","time":"2019-02-15T11:00:08.688763609Z"}
{"log":"'te', 'trailers'\n","stream":"stderr","time":"2019-02-15T11:00:08.688766209Z"}
{"log":"'content-type', 'application/grpc'\n","stream":"stderr","time":"2019-02-15T11:00:08.688768809Z"}
{"log":"'x-envoy-internal', 'true'\n","stream":"stderr","time":"2019-02-15T11:00:08.688771609Z"}
{"log":"'x-forwarded-for', '192.168.6.6'\n","stream":"stderr","time":"2019-02-15T11:00:08.688774309Z"}
{"log":"\n","stream":"stderr","time":"2019-02-15T11:00:08.688777009Z"}

docker_mode: 0n shall - recombine split Docker log lines before passing them to any parser as configured above.

I would expect it will apply to this case as well, however I it does not. Below I provided my configuration.

Describe the solution you'd like

in_tail/docker_mode - shall have the possibility to read docker's json-log as a stream of original text. json parser, here is just pre-processor that will buffer the "log" key, so multiline regexp patterns can be used later.

Describe alternatives you've considered

I believe this problem can be avoided if:

  1. docker logs are sent directly to fluentd (docker fluentd driver, https://docs.docker.com/config/containers/logging/fluentd/)
  2. docker logs are sent to journal/syslog etc

However:

Fluent bit FILTERS are applied after the parsing, so can't transform the stream early.

Additional context

Fluentbit config I am using:

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Skip_Long_Lines   Off
        Docker_Mode       On
        Refresh_Interval  10
        Chunk_Size        32k
        Buffer_Max_Size   2M
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        K8S-Logging.Parser  On

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped_utf8    log    do_next
        Decode_Field_As   escaped         log    do_next
        Decode_Field_As   json            log
epcim commented 5 years ago

Related:

epcim commented 5 years ago

Repository that can be used for testing: https://github.com/epcim/fluentbit-sandbox

etwillbefine commented 5 years ago

Hey I'm struggling with the same right now. Is there any additional planned feature or bug fix for this? Docker_Mode On is exactly what I want. My parsers can then extract fields. I struggle finding a solution for spring boot stack traces with fluent bit at all (using Multiline or Docker_Mode). Any update or feedback would be appreciated.

sysword commented 5 years ago

I'm struggling with this right now. Do you have any solution for k8s's multiline log?

sreedharbukya commented 5 years ago

I am also stuck in same issue. Multiline log parser is not working in K8.

DanielJRutledge commented 5 years ago

I'm stuck with this as well; is there a set of input flags under which the large input (16 kb) input from Docker will work?

isurusiri commented 5 years ago

I'm also experiencing the same problem with not being able to parse multiline logs in Kubernetes cluster. I have tried solutions suggested in related threads in this repo but couldn't get it working.

My input config:

input-kubernetes.conf: |
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/abc-*.log
    Parser            docker
    Parser_Firstline  multiline_parser_head
    Parser_1          multiline_parser_error
    Multiline         On
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Parsers:

parsers.conf: |
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name        multiline_parser_head
    Format      regex
    Regex       /\d{4}-\d{1,2}-\d{1,2}/

[PARSER]
    Name        multiline_parser_error
    Format      regex
    Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/
TomaszKlosinski commented 4 years ago

Same issue here. Any news on this?

manvinderr21 commented 4 years ago

I'm also experiencing the same problem with not being able to parse multiline logs in Kubernetes cluster. I have tried solutions suggested in related threads in this repo but couldn't get it working.

My input config:

input-kubernetes.conf: |
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/abc-*.log
    Parser            docker
    Parser_Firstline  multiline_parser_head
    Parser_1          multiline_parser_error
    Multiline         On
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Parsers:

parsers.conf: |
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name        multiline_parser_head
    Format      regex
    Regex       /\d{4}-\d{1,2}-\d{1,2}/

[PARSER]
    Name        multiline_parser_error
    Format      regex
    Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/

@isurusiri Are you able to figure out any solution for this?

matayto commented 4 years ago

I have the same issue with multi-line JSON output in docker logs.

@isurusiri Based on my understanding of the documentation, the Parser directive is ignored in the tail input when MultiLine is set to On. However, Parser_Firstline and Parser_N are not ignored.

Edit: Link to 1.3 documentation referenced above: https://docs.fluentbit.io/manual/v/1.3/input/tail#multiline

ghost commented 4 years ago

Hi there, any update on that issue? Thank you

jujugrrr commented 4 years ago

Is fluentd this only alternative to fix this issue?

TomaszKlosinski commented 4 years ago

Is fluentd this only alternative to fix this issue?

No, I'm using elastic fielbeat for this and it works like a charm.

ghost commented 4 years ago

Is fluentd this only alternative to fix this issue?

No, I'm using elastic fielbeat for this and it works like a charm.

Can you please share your solution to that? Thank you

dharmab commented 4 years ago

I think @TomaszKlosinski is referring to using Elastic Filebeat as a log shipper instead of Fluent Bit.

Unfortunately, that only works if you're using the ELK stack- not much help to those of us using other products, e.g. Splunk.

dharmab commented 4 years ago

@edsiper are you able to look into the issue? The original author of the PR that added this feature in https://github.com/fluent/fluent-bit/pull/863 is no longer on GitHub. I took a look at plugins/in_tail/tail_dockermode to see if I could help, but the lack of code comments and use of opaque abbreviations are pretty inaccessible.

sumo-drosiek commented 4 years ago

I prepared some changes in the dockermode plugin (#2043). I need to suit it to the contributing guide but I believe it's worth to look and give me some feedback.

Output for issued input:

[0] containers.var.log.containers.test.log: [1585073268.000318200, {"log"=>"{"log":"[2019-02-15 11:00:08.688][9][debug][router] source/common/router/router.cc:303] [C0][S14319188767040639561] router decoding headers:\n':method', 'POST'\n':path', '/envoy.api.v2.ClusterDiscoveryService/StreamClusters'\n':authority', 'xds_cluster'\n':scheme', 'http'\n'te', 'trailers'\n'content-type', 'application/grpc'\n'x-envoy-internal', 'true'\n'x-forwarded-for', '192.168.6.6'\n\n","stream":"stderr","time":"2019-02-15T11:00:08.688777009Z"}"}]
vishiy commented 4 years ago

@sumo-drosiek - any updates on merging this ?

collardmsc commented 4 years ago

@sumo-drosiek any updates?

sumo-drosiek commented 4 years ago

@collardmsc @vishiy Sorry for no updates. The PR has been reviewed. I'm working on the runtime tests and everything should be ready soon :)

collardmsc commented 4 years ago

@sumo-drosiek Thanks a bunch for your time and effort working on this!

sumo-drosiek commented 4 years ago

PR is ready for another review 🤞

Stono commented 4 years ago

:wave: just wondering if anyone has any updates on this? It's really help me!

Oduig commented 4 years ago

Looks like the issue was solved, brilliant. Which version number will support the new Docker_Mode_Parser field?

sumo-drosiek commented 4 years ago

@Oduig AFAIK the 1.5.0 supports Docker_Mode_Parser

davelosert commented 4 years ago

This is not documented yet, is it? At least I can not find anything about the Docker_Mode_Parser within the official docs here: https://docs.fluentbit.io/manual/pipeline/inputs/tail#docker_mode

sumo-drosiek commented 4 years ago

@davelosert Thats right. I didn't documented it yet

shake76 commented 3 years ago

Im also stuck with fluent-bit and multiline logs in EKS...Does anyone found a solution/workaround for this? if so I will appreciate your comments in advance

ankit1mg commented 3 years ago

Im also stuck with fluent-bit and multiline logs in EKS...Does anyone found a solution/workaround for this? if so I will appreciate your comments in advance

Hey @shake76 Did you find any solution yet? If yes, pls let me know the sample config for docker parser with multiline parser

sumo-drosiek commented 3 years ago

@shake76 @ankit1mg Is something wrong with docker_mode_parser and EKS?

agup006 commented 3 years ago

Hey folks, for the latest issues around logs not working wanted to check if this might be something around CRI format vs. Docker format log parsing? https://docs.fluentbit.io/manual/installation/kubernetes#container-runtime-interface-cri-parser or if the ask if fully around multiline + docker mode

shake76 commented 3 years ago

@sumo-drosiek I had not a chance to do a try yet, Do you have an example to take a quick look on it? Thanks in Advance

shake76 commented 3 years ago

@sumo-drosiek I finally able to get it working, I will left an example here for future reference

input-kubernetes.conf: |
      [INPUT]
         Name               tail
         Tag                kube.*
         Path               /var/log/containers/*.log
         Docker_Mode        On
         Docker_Mode_Flush  5
         Docker_Mode_Parser read_firstline
         DB                 /var/log/flb_kube.db
         Parser             docker
         Mem_Buf_Limit      10MB
         Skip_Long_Lines    On
         Refresh_Interval   10

parsers.conf: |
    [PARSER]
        Name                read_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

It was the configuration that worked for me, maybe you probably need to adapt the regex expression
tiagovdot commented 3 years ago

@shake76 I think you have a typo in PARSER.Name red_firstline

shake76 commented 3 years ago

@tiagovdot Yes you are right fixed!

Tafsiralam commented 3 years ago

The following config works great for me. Adding the config:

input-kubernetes.conf: |

    [INPUT]
        Name               tail
        Tag                kube.*
        Path               /var/log/containers/*.log
        Read_from_head     true
        DB                 /var/log/flb_graylog.db
        DB.Sync            Normal
        Docker_Mode        On
        Docker_Mode_Flush  5
        Docker_Mode_Parser multi_line
        Parser             docker
        Buffer_Chunk_Size  512KB
        Buffer_Max_Size    5M
        Rotate_Wait        30
        Mem_Buf_Limit      30MB
        Skip_Long_Lines    On
        Refresh_Interval   10

parsers.conf: |     

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name                     multi_line
        Format                   regex
        Regex                    (?<log>^{"log":"\d{4}-\d{2}-\d{2}.*)
unbirabka commented 2 years ago

@sumo-drosiek I finally able to get it working, I will left an example here for future reference

input-kubernetes.conf: |
      [INPUT]
         Name               tail
         Tag                kube.*
         Path               /var/log/containers/*.log
         Docker_Mode        On
         Docker_Mode_Flush  5
         Docker_Mode_Parser read_firstline
         DB                 /var/log/flb_kube.db
         Parser             docker
         Mem_Buf_Limit      10MB
         Skip_Long_Lines    On
         Refresh_Interval   10

parsers.conf: |
    [PARSER]
        Name                read_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

It was the configuration that worked for me, maybe you probably need to adapt the regex expression

thanks its work for me