fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.82k stars 1.58k forks source link

Java multiline parser is invalid #9507

Open xiaopanggege opened 2 days ago

xiaopanggege commented 2 days ago

Bug Report

Describe the bug Because I didn't know that the built-in Java multiline parser couldn't match my k8s environment, I customized the Java multiline parser, but it seems to have no effect

To Reproduce

- Steps to reproduce the problem:

**Screenshots**
![image](https://github.com/user-attachments/assets/36119d08-3b31-4f6b-afa1-9d695ba3a766)

**Your Environment**
<!--- Include as many relevant details about the environment you experienced the bug in -->
* Version used:fluent-bit 3.0
* Configuration:

vim fluent-bit.yaml


Source: fluent-bit/templates/serviceaccount.yaml

apiVersion: v1 kind: ServiceAccount metadata: name: fluent-bit namespace: efk labels: helm.sh/chart: fluent-bit-0.46.0 app.kubernetes.io/name: fluent-bit app.kubernetes.io/instance: fluent-bit app.kubernetes.io/version: "3.0.0" app.kubernetes.io/managed-by: Helm

Source: fluent-bit/templates/configmap.yaml

apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit namespace: efk labels: helm.sh/chart: fluent-bit-0.46.0 app.kubernetes.io/name: fluent-bit app.kubernetes.io/instance: fluent-bit app.kubernetes.io/version: "3.0.0" app.kubernetes.io/managed-by: Helm data: custom_parsers.conf: | [PARSER] Name docker_no_time Format json Time_Keep Off Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%L

[MULTILINE_PARSER]
    name            java_multiline01
    type            regex
    flush_timeout   1000
    # 以2024-10-18 08:50:39日期开头
    rule            "start_state" "/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.*/" "cont"
    rule            "cont"        "/^(?!\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*/" "cont"

[MULTILINE_PARSER]
    name            java_multiline02
    type            regex
    flush_timeout   1000
    # 以Dec开头的日期
    rule            "start_state" "/^Dec \d+ \d+:\d+:\d+.*/" "cont"
    rule            "cont"        "/^(?!Dec \d+ \d+:\d+).*/" "cont"

[MULTILINE_PARSER]
    name            java_multiline03
    type            regex
    flush_timeout   1000
    # 以192.168.1.1 2024-10-18 08:50:39这种IP+时间开头
    rule            "start_state" "/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.*/" "cont"
    rule            "cont"        "/^\s+at\s+.*|Caused by: .*/" "cont"

fluent-bit.conf: | [SERVICE] Daemon Off Flush 1 Log_Level info Parsers_File /fluent-bit/etc/parsers.conf Parsers_File /fluent-bit/etc/conf/custom_parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    multiline.parser java_multiline01, java_multiline02, java_multiline03, java, docker, cri
    Tag kube.*
    Mem_Buf_Limit 50MB
    Skip_Long_Lines On
    Ignore_Older 1h

[INPUT]
    Name systemd
    Tag host.*
    Systemd_Filter _SYSTEMD_UNIT=kubelet.service
    Read_From_Tail On

[FILTER]
    Name kubernetes
    Match kube.*
    Merge_Log On
    Keep_Log Off
    K8S-Logging.Parser On
    K8S-Logging.Exclude On

[OUTPUT]
    Name es
    Match kube.*
    Host 10.97.2.200
    HTTP_User elastic
    HTTP_Passwd 123456
    tls On
    tls.verify Off
    index k8s-pod-%Y%m
    ##Logstash_Format On
    Retry_Limit 3
    Suppress_Type_Name On
    Replace_Dots On
[OUTPUT]
    Name es
    Match host.*
    Host 10.97.2.200
    HTTP_User elastic
    HTTP_Passwd 123456
    tls On
    tls.verify Off
    index k8s-kubelet-%Y%m
    ##Logstash_Format On
    ##Logstash_Prefix node
    Retry_Limit 3
    Suppress_Type_Name On
    Replace_Dots On

* Environment name and version (e.g. Kubernetes? What version?):k8s v1.22.6
* Server type and version: centos7.9
* Operating System and version:centos7.9
* Filters and plugins:

**Additional context**
<!--- How has this issue affected you? What are you trying to accomplish? -->
<!--- Providing context helps us come up with a solution that is most useful in the real world -->
patrick-stephens commented 2 days ago

I think you have a fundamental issue here in that you're not handling the actual log format on disk which will be the kubelet format. This is unrelated to your application log format, first of all you must parse the kubelet format and then you can do your application log parsing.

Have a look at the incorrect parsers section here: https://chronosphere.io/learn/fluent-bit-kubernetes-filter/

I'd also caution not to debug via some other stack which may introduce some additional changes or concerns to cope with, instead use the stdout output to see the actual data as Fluent Bit is dealing with it. This will likely highlight things better to work with and once it is correct here then you can send it to the output, then any further issues you know are to do with that output processing it.

https://chronosphere.io/learn/fluent-bit-tips-tricks/