fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.81k stars 1.58k forks source link

Fluent-bit input plugin tail doesn't process all logs: scan_blog add(): dismissed: #4155

Closed rguske closed 5 months ago

rguske commented 3 years ago

Bug Report

Describe the bug Fluent Bit is not processing all logs located in /var/log/containers/.

To Reproduce The following messages are displayed:

[2021/10/01 14:40:05] [debug] [input:tail:tail.0] scanning path /var/log/containers/*.log                                                                                                                                                                                                                                               │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/activator-85cd6f6f9-nrncf_knative-serving_activator-3b631f27f6667599ae940f94afe6a65a4d1d488e7979fced513fa910082a5ae1.log, inode 404768                                                                                                │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/activator-85cd6f6f9-nrncf_knative-serving_activator-ca32320178170fe4198ce1b0bd57d8ea031c7c886a7b0e3d66bb1b78b67613b8.log, inode 921337                                                                                                │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-agent-63659cdc8e5ddba3eaf729be280661b45fd198e6d2c7195965be85cdca81f41a.log, inode 536837                                                                                                        │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-agent-8726abf73577f597e15716176cfcdce442b159d00ec12f59e439719d824a9585.log, inode 1190181                                                                                                       │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-ovs-08045b767f2f8ee421b3b4d8d5b646b49b4e12199ae957cad178dd3d11670ec6.log, inode 663855 

ServiceAccount:

rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  verbs:
  - get
  - list
  - watch

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: vmware-system
  labels:
    k8s-app: fluent-bit
apiVersion: v1
data:
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
    [FILTER]
        Name                  modify
        Match                 kube.*
        Copy                  kubernetes k8s

    [FILTER]
        Name                  nest
        Match                 kube.*
        Operation             lift
        Nested_Under          kubernetes
  filter-record.conf: |
    [FILTER]
        Name                record_modifier
        Match               *
        Record tkg_cluster veba-demo.jarvis.tanzu
        Record tkg_instance veba-demo.jarvis.tanzu
    [FILTER]
        Name                  nest
        Match                 kube.*
        Operation             nest
        Wildcard              tkg_instance*
        Nest_Under            tkg

    [FILTER]
        Name                  nest
        Match                 kube_systemd.*
        Operation             nest
        Wildcard              SYSTEMD*
        Nest_Under            systemd
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     debug
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE input-systemd.conf
    @INCLUDE input-kube-apiserver.conf
    @INCLUDE input-auditd.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE filter-record.conf
    @INCLUDE output-syslog.conf
  input-auditd.conf: |
    [INPUT]
        Name              tail
        Tag               audit.*
        Path              /var/log/audit/audit.log
        Parser            logfmt
        DB                /var/log/flb_system_audit.db
        Mem_Buf_Limit     50MB
        Refresh_Interval  10
        Skip_Long_Lines   On
  input-kube-apiserver.conf: |
    [INPUT]
        Name              tail
        Tag               apiserver_audit.*
        Path              /var/log/kubernetes/audit.log
        Parser            json
        DB                /var/log/flb_kube_audit.db
        Mem_Buf_Limit     50MB
        Refresh_Interval  10
        Skip_Long_Lines   On
  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10
  input-systemd.conf: |
    [INPUT]
        Name                systemd
        Tag                 kube_systemd.*
        Path                /var/log/journal
        DB                  /var/log/flb_kube_systemd.db
        Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
        Systemd_Filter      _SYSTEMD_UNIT=containerd.service
        Read_From_Tail      On
        Strip_Underscores   On
  output-syslog.conf: |
    [OUTPUT]
        Name   syslog
        Match  kube.*
        Host   10.197.79.57
        Port   514
        Mode   tcp
        Syslog_Format        rfc5424
        Syslog_Hostname_key  tkg_cluster
        Syslog_Appname_key   pod_name
        Syslog_Procid_key    container_name
        Syslog_Message_key   message
        Syslog_SD_key        k8s
        Syslog_SD_key        labels
        Syslog_SD_key        annotations
        Syslog_SD_key        tkg

    [OUTPUT]
        Name   syslog
        Match  kube_systemd.*
        Host   10.197.79.57
        Port   514
        Mode   tcp
        Syslog_Format        rfc5424
        Syslog_Hostname_key  tkg_cluster
        Syslog_Appname_key   tkg_instance
        Syslog_Message_key   MESSAGE
        Syslog_SD_key        systemd
  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        docker-daemon
        Format      regex
        Regex       time="(?<time>[^ ]*)" level=(?<level>[^ ]*) msg="(?<msg>[^ ].*)"
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        logfmt
        Format      logfmt

    [PARSER]
        Name        syslog-rfc5424
        Format      regex
        Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        syslog-rfc3164-local
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S
        Time_Keep   On

    [PARSER]
        Name        syslog-rfc3164
        Format      regex
        Regex       /^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
        Time_Key    time
        Time_Format %b %d %H:%M:%S
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name    kube-custom
        Format  regex
        Regex   (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: vmware-system
  labels:
    k8s-app: fluent-bit
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: fluent-bit
  template:
    metadata:
      labels:
        k8s-app: fluent-bit
    spec:
      containers:
      - image: projects.registry.vmware.com/tkg/fluent-bit:v1.6.9_vmware.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 2020
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: fluent-bit
        ports:
        - containerPort: 2020
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v1/metrics/prometheus
            port: 2020
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 80m
            memory: 200Mi
          requests:
            cpu: 50m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log
          name: var-log
        - mountPath: /var/log/pods
          name: var-log-pods
        - mountPath: /var/log/containers
          name: var-log-containers
        - mountPath: /var/lib/docker/containers
          name: var-lib-docker-containers
          readOnly: true
        - mountPath: /fluent-bit/etc/
          name: fluent-bit-config
        - mountPath: /run/log
          name: systemd-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: fluent-bit
      serviceAccountName: fluent-bit
      terminationGracePeriodSeconds: 10
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists
      volumes:
      - hostPath:
          path: /var/log
          type: ""
        name: var-log
      - hostPath:
          path: /var/log/pods
          type: ""
        name: var-log-pods
      - hostPath:
          path: /var/log/containers
          type: ""
        name: var-log-containers
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: var-lib-docker-containers
      - hostPath:
          path: /run/log
          type: ""
        name: systemd-log
      - configMap:
          defaultMode: 420
          name: fluent-bit-config
        name: fluent-bit-config
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Expected behavior All logs in /var/log/containers/ should be processed.

Your Environment

Additional context Running tail -f manually from within the system on a specific pod log, which is writing to stdout, works.

{"log":"10/03/2021 14:47:13 - Handler Processing Completed ...\n","stream":"stdout","time":"2021-10-03T14:47:13.829672574Z"}
{"log":"\n","stream":"stdout","time":"2021-10-03T14:47:13.829772103Z"}

Logs which e.g. aren't processed:

root@veba-kn [ /var/log/containers ]# ls -rtl                                                                                                                                                                                          
total 376                                                                                                                                                                                                                                                                                                                                                                                                                                     
lrwxrwxrwx 1 root root 100 Sep 13 21:31 antrea-agent-gql5r_kube-system_antrea-agent-8726abf73577f597e15716176cfcdce442b159d00ec12f59e439719d824a9585.log -> /var/log/pods/kube-system_antrea-agent-gql5r_31aa406a-286c-495b-9dcf-e4036c
2a4756/antrea-agent/3.log                                                                                                                                                                                                              
lrwxrwxrwx 1 root root  98 Sep 13 21:31 antrea-agent-gql5r_kube-system_antrea-ovs-3f300f1d7b28c069df1f34cf37ff89be95d69fc3dc4ea0f269b5bd07ce5d56c1.log -> /var/log/pods/kube-system_antrea-agent-gql5r_31aa406a-286c-495b-9dcf-e4036c2a
4756/antrea-ovs/3.log                                                                                                                                                                                                                  
lrwxrwxrwx 1 root root 102 Sep 13 21:31 envoy-89vct_contour-external_shutdown-manager-c8ed97927c25d465f31cce5ab8bd91d02742504f8cf73ad53e493738d0a17f74.log -> /var/log/pods/contour-external_envoy-89vct_1c947a55-2b86-48bd-b442-c6c51e
c2dd3a/shutdown-manager/3.log                                                                                                                                                                                                          
lrwxrwxrwx 1 root root  91 Sep 13 21:31 envoy-89vct_contour-external_envoy-0ea7a33d12105058f74eae9653dd0266ac99ef2ba7f6cb3a3b04a8ec3bc02525.log -> /var/log/pods/contour-external_envoy-89vct_1c947a55-2b86-48bd-b442-c6c51ec2dd3a/envo
y/3.log                                                                                                                                                                                                                                
lrwxrwxrwx 1 root root 104 Sep 13 21:31 contour-5869594b-7jm89_contour-external_contour-803e6591f657fae9539b64ae4f86fa44cce99b409c5f92979c6045cf4b98b52c.log -> /var/log/pods/contour-external_contour-5869594b-7jm89_cc6cf243-7d3f-483
9-91e8-741ab87f6488/contour/3.log                                                                                                                                                                                                      
lrwxrwxrwx 1 root root 106 Sep 13 21:31 contour-5d47766fd8-n24mz_contour-internal_contour-ae34a8ae0b8398da294c5061ec5c0ef1e9be8cb2979f07077e5e9df12f2bab67.log -> /var/log/pods/contour-internal_contour-5d47766fd8-n24mz_a87131ad-d73a
-4371-a47b-dcc410f3b6e4/contour/3.log                                                                                                                                                                                                  
lrwxrwxrwx 1 root root 100 Sep 13 21:31 coredns-74ff55c5b-mjdlr_kube-system_coredns-60bd5f49def85a0ddc929e2c2da5c793a3c6de00cd6a81bdcfdb21f3d4f45129.log -> /var/log/pods/kube-system_coredns-74ff55c5b-mjdlr_7ef260c1-308e-4162-8a84-2
31d560f8023/coredns/3.log

I've also tried running the DS in

securityContext:
          privileged: true

Similar issues I found but which doesn't provide the solution for this issue:

3857

4014

Your help would be much appreciated. Thanks

neugeeug commented 1 year ago

@neugeeug Can you try to use v 1.3.7?

Well @ashutosh-maheshwari to be honest, I am not sure if we would like to use such old version in production ...

brat002 commented 1 year ago

Hard to imagine how so important bug could exist for so long. Main feature of the project just doesn't work.

leonardo-albertovich commented 1 year ago

Hi @neugeeug, do you think you would be able to reproduce this issue in a reliable way and help me do so myself?

neugeeug commented 1 year ago

Hi @neugeeug, do you think you would be able to reproduce this issue in a reliable way and help me do so myself?

Hi, @leonardo-albertovich we've just finished an extended investigation and it seems that the problem is somewhere else. Our logs are ending up in Splunk, the problems we were facing were related to a long messages and it seems that there is a limit on the Splunk side, which is most likely rejecting the long batches. It was difficult to find it because logs didn't contain any information about any Splunk problem. It seems also that this part of the log: scan_blog add(): dismissed: /jcore/logs/config.log, inode 1863096 was not related to the problem. We have just decreased memory buffers, flush period adjusting the values to Splunk limits so the batches approaching Splunk are smaller. It has started working. Thank you all for help.

leonardo-albertovich commented 1 year ago

Oh great, I don't know if out_splunk has a batch size setting but if it doesn't then it might be a good improvement.

I'd really appreciate it if you were able to validate the status and impact of the change and if appropriate submit an improvement proposal.

On a side note, it seems that is issue is stale and generates confusion. In one hand I'm not in favor of closing issues we are 100% sure have been fixed but with no user input I'm struggling to see the value of keeping this one open so please, if anyone has any input I'd like to hear it.

neugeeug commented 1 year ago

Oh great, I don't know if out_splunk has a batch size setting but if it doesn't then it might be a good improvement.

It seems it does not have such setting and I agree, it would help. Thank you.

Jonathan-w6d commented 1 year ago

If it can be of any help, my setup did not include Splunk; just fluent-bit--> opensearch

leonardo-albertovich commented 1 year ago

@Jonathan-w6d do you think you could put together a reproduction case so I can analyse it?

leonardo-albertovich commented 1 year ago

Actually, considering that the original issue talks about rather old and unsupported versions it might be good to take a step back and re-state everything so we know we are talking about the same thing.

Would you mind describing your setup and sharing your configuration file? I want to be sure that this issue is still present in the latest release of fluent-bit 2.0 and 2.1.

ashutosh-maheshwari commented 1 year ago

Actually, considering that the original issue talks about rather old and unsupported versions it might be good to take a step back and re-state everything so we know we are talking about the same thing.

Would you mind describing your setup and sharing your configuration file? I want to be sure that this issue is still present in the latest release of fluent-bit 2.0 and 2.1.

@leonardo-albertovich On Centos 7.9, install fluent-bit as a systemd service. You can see the above issue. Read my comment - https://github.com/fluent/fluent-bit/issues/4155#issuecomment-1498278814

ashutosh-maheshwari commented 1 year ago

@neugeeug Can you try to use v 1.3.7?

Well @ashutosh-maheshwari to be honest, I am not sure if we would like to use such an old version in production ...

@neugeeug The only drawback of using the old version is the features and vulnerabilities. Most of them work including mTLS, Tail, and other sets of plugins. You need to check if there are no vulnerabilities and believe me fluent-bit 1.3.7 is the most stable version. I have tested every version.

leonardo-albertovich commented 1 year ago

@ashutosh-maheshwari, could you please help putting together a docker based reproduction case with documentation about the deployment process and expectations so it's easier to reproduce the results and ensure we are on the same page?

I know it must be annoying but it would really help improve the pace and ensure the issue is finally resolved.

Jonathan-w6d commented 1 year ago

I will try to do so, but tbh I dropped fluentbit out of my stack last July given that noone was helping on the matter.

leonardo-albertovich commented 1 year ago

I'm sorry to hear that but I understand you. If you find the time to do it I'd really appreciate it, otherwise, if anyone else wants to step up it'd be great.

benjaminhuo commented 1 year ago

Most versions of Fluent-bit are affected by this bug. I have tested versions 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 2.0.6, 2.0.8, and 2.0.10, and they all exhibited the same issue. The only versions that worked for me were 1.2.x, and 1.3.x. @edsiper @patrick-stephens Can you please review the code for the tail plugin in version 1.3.7? You might find the bug.

@ashutosh-maheshwari you mean 1.3.6 works for you while 1.3.7 has the same issue?

https://fluentbit.io/announcements/v1.3.7/

image

ashutosh-maheshwari commented 1 year ago

@benjaminhuo It runs on 1.3.7. The problem started post that version. @Jonathan-w6d What are you using now? I would love to switch.

benjaminhuo commented 1 year ago

@benjaminhuo It runs on 1.3.7. The problem started post that version.

So you mean the changes in 1.3.8 might introduce this bug and 1.3.8 has the same issue, right? @ashutosh-maheshwari @edsiper @patrick-stephens @leonardo-albertovich , I think this gives us some clues to follow. image

leonardo-albertovich commented 1 year ago

Thanks for taking the time to go through the release notes @benjaminhuo, sadly, I don't think that will be enough to get this fixed.

What we need is someone to help put together a reliable reproduction case and I'd like to reiterate that if anyone does I'll take care of having the issue fixed either by doing it myself or having someone from the team work on it.

Jonathan-w6d commented 1 year ago

@leonardo-albertovich i can give you the last values.yaml containing the confs that I used before switching to another solution if you'd like ?

Jonathan-w6d commented 1 year ago

Here you go, that's actually pretty straightforward, just adding k8s metadata and modifying tag on some logs so I can differentiate them, then output to opensearch directly.



# replicaCount -- Only applicable if kind=Deployment
replicaCount: 1

image:
  repository: cr.fluentbit.io/fluent/fluent-bit
  tag: "1.9.6-debug"
  pullPolicy: Always

#image:
#  repository: gke.gcr.io/gke-metrics-agent
#  tag: "1.8.3-gke.0"
#  pullPolicy: Always

testFramework:
  image:
    repository: busybox
    pullPolicy: Always
    tag: latest

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name:

rbac:
  create: true
  nodeAccess: true

podSecurityPolicy:
  create: false
  annotations: {}

openShift:
  enabled: false
  securityContextConstraints:
    create: true
    annotations: {}

podSecurityContext: {}

hostNetwork: false
dnsPolicy: ClusterFirst

dnsConfig: {}

hostAliases: []

securityContext: {}

service:
  type: ClusterIP
  port: 2020
  labels: {}
  annotations: {}

serviceMonitor:
  enabled: false

prometheusRule:
  enabled: false

dashboards:
  enabled: false
  labelKey: grafana_dashboard
  annotations: {}
  namespace: ""

lifecycle: {}

livenessProbe:
  httpGet:
    path: /
    port: http

readinessProbe:
  httpGet:
    path: /api/v1/health
    port: http

resources: {}
#   limits:
#     cpu: 100m
#     memory: 128Mi
#   requests:
#     cpu: 100m
#     memory: 128Mi

## only available if kind is Deployment
ingress:
  enabled: false
  className: ""
  annotations: {}
  hosts: []
  extraHosts: []
  tls: []

## only available if kind is Deployment
autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 75
  customRules: []
  behavior: {}

## only available if kind is Deployment
podDisruptionBudget:
  enabled: false
  annotations: {}
  maxUnavailable: "30%"

nodeSelector: {}

tolerations:
  - operator: Exists
    effect: NoExecute
  - operator: Exists
    effect: NoSchedule
#  - effect: NoSchedule
#    key: dedicated
#    operator: Equal
#    value: appli-strada
#  - effect: NoSchedule
#    key: dedicated
#    operator: Equal
#    value: infra
#  - effect: NoSchedule
#    key: dedicated
#    operator: Equal
#    value: kafka

affinity: {}

labels: {}

annotations: {}

podAnnotations: {}

podLabels: {}

priorityClassName: ""

env: []
#  - name: FOO
#    value: "bar"

# The envWithTpl array below has the same usage as "env", but is using the tpl function to support templatable string.
# This can be useful when you want to pass dynamic values to the Chart using the helm argument "--set <variable>=<value>"
# https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function
envWithTpl: []
#  - name: FOO_2
#    value: "{{ .Values.foo2 }}"
#
# foo2: bar2

envFrom: []

extraContainers: []

flush: 5

metricsPort: 2020

extraPorts: []

extraVolumes:
  - name: certs
    secret:
      secretName: tls-for-fluentbit-key-pair

extraVolumeMounts:
  - name: certs
    mountPath: /fluent-bit/etc/certs/ca.crt
    subPath: ca.crt
  - name: certs
    mountPath: /fluent-bit/etc/certs/tls.key
    subPath: tls.key
  - name: certs
    mountPath: /fluent-bit/etc/certs/tls.crt
    subPath: tls.crt

updateStrategy: {}

existingConfigMap: ""

networkPolicy:
  enabled: false

luaScripts: {}

config:
  service: |
    [SERVICE]
        Daemon Off
        Flush {{ .Values.flush }}
        Grace 10
        Log_Level {{ .Values.logLevel }}
        Log_File      /dev/stdout
        Parsers_File custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port {{ .Values.metricsPort }}
        Storage.path  /var/log/fluentbit
        Storage.sync  normal
        Storage.metrics on

  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Skip_Long_Lines Off
        Mem_Buf_Limit 100MB

  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Buffer_size False

    [FILTER]
        Name        grep
        Match       kube.*

        Exclude     $kubernetes['container_name'] fluent-bit
        Exclude     $kubernetes['container_name'] opensearch
        Exclude     $kubernetes['namespace_name'] logs

    [FILTER]
        Name        modify
        Match       kube.*
        Hard_copy log message

    [FILTER]
        Name          rewrite_tag
        Match         kube.*
        Emitter_Mem_Buf_Limit 100MB

        Rule          $kubernetes['labels']['createdfor'] strada  strada.$TAG false
        Rule          $kubernetes['labels']['app'] gitlab-gitlab  strada.$TAG false
        Rule          $kubernetes['container_image'] strada  strada.$TAG false

    [FILTER]
        Name         parser
        Match_Regex  ^(kube|strada)      
        Key_Name     message
        Reserve_Data True
        Parser       pg
        Parser       glog
        Parser       nginx
        Parser       json
        Parser       kfk

  outputs: |
    [OUTPUT]
        Name opensearch
        Match kube.*
        Host opensearch-cluster-master-headless.logs.svc
        Port 9200
        HTTP_User admin
        HTTP_Passwd admin
        Logstash_Format On
        Logstash_Prefix logk8s
        Logstash_DateFormat %Y.%m.%d
        Retry_Limit False
        Suppress_Type_Name On
        tls On
        tls.verify On
        tls.ca_file /fluent-bit/etc/certs/ca.crt
        tls.crt_file /fluent-bit/etc/certs/tls.crt
        tls.key_file /fluent-bit/etc/certs/tls.key
        Buffer_size False
        Trace_Output On
        Trace_Error On
        Replace_Dots On

    [OUTPUT]
        Name opensearch
        Match strada.*
        Host opensearch-cluster-master-headless.logs.svc
        Port 9200
        HTTP_User admin
        HTTP_Passwd admin
        Logstash_Format On
        Logstash_Prefix logk8s-strada
        Logstash_DateFormat %Y.%m.%d
        Retry_Limit False
        Suppress_Type_Name On
        tls On
        tls.verify On
        tls.ca_file /fluent-bit/etc/certs/ca.crt
        tls.crt_file /fluent-bit/etc/certs/tls.crt
        tls.key_file /fluent-bit/etc/certs/tls.key
        Buffer_size False
        Trace_Output On
        Trace_Error On
        Replace_Dots On

  customParsers: |
    [PARSER]
        Name        cri
        Format      regex
        Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        nginx
        Format      regex
        Regex       (?<host>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (\[(?<proxy_alternative_upstream_name>[^ ]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<reg_id>[^ ]*).*$
        Time_Key    time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name pg
        Format regex
        Regex (?<time>[^ ]+[ ]+[^ ]+[ ]+[^ ]*) \[(?<test>[^\]]*)\] (STATEMENT:\s*(?<statement>[^$].*?)|ERROR:\s*(?<error>[^$].*?)|DETAIL:\s*(?<detail>[^$].*?))$
        Time_Key time
        Time_Format %Y-%m-%d %H:%M:%S.%L %Z

    [PARSER]
        Name        glog
        Format      regex
        Regex       ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
        Time_Key    time
        Time_Format %m%d %H:%M:%S.%L%z

    [PARSER]
        Name        kfk
        Format      regex
        Regex       ^(?<time>[^ ]+[ ]+[^ ]*) \[(?<id>[^\]]*)\] - (?<level>[^ ]*)  \[(?<thread>[^\]]*)\] - (?<message>[^$]*)$
        Time_Key    time
        Format      %Y-%m-%d %H:%M:%S,%L

    [PARSER]
        Name        logfmt
        Format      logfmt

  extraFiles: {}

# The config volume is mounted by default, either to the existingConfigMap value, or the default of "fluent-bit.fullname"
volumeMounts:
  - name: config
    mountPath: /fluent-bit/etc/fluent-bit.conf
    subPath: fluent-bit.conf
  - name: config
    mountPath: /fluent-bit/etc/custom_parsers.conf
    subPath: custom_parsers.conf

daemonSetVolumes:
  - name: varlog
    hostPath:
      path: /var/log
  - name: varlibdockercontainers
    hostPath:
      path: /var/lib/docker/containers
  - name: etcmachineid
    hostPath:
      path: /etc/machine-id
      type: File

daemonSetVolumeMounts:
  - name: varlog
    mountPath: /var/log
  - name: varlibdockercontainers
    mountPath: /var/lib/docker/containers
    readOnly: true
  - name: etcmachineid
    mountPath: /etc/machine-id
    readOnly: true

args: []

command: []

initContainers: []

logLevel: trace```
leonardo-albertovich commented 1 year ago

Thank you @Jonathan-w6d, we'll try to have someone from the team reproduce the issue. There's no ETA at the moment but I'll send an update as soon as we have some insight.

msolters commented 1 year ago

We are encountering this in v2.1.2 in EKS 1.22. For us it manifests as log files pre-existing when Fluent Bit starts up on a node are tailed and processed. New files created on that node moving forward yield the scan_blog add() debug log message and can't be processed until Fluent Bit is restarted -- so that the files are "pre-existing" by construction. Not a tenable solution but hopefully a useful symptom diagnostically?

benjaminhuo commented 1 year ago

New files created on that node moving forward yield the scan_blog add() debug log message and can't be processed until Fluent Bit is restarted -- so that the files are "pre-existing" by construction.

@msolters That's indeed a good clue to follow, hopefully this can bring some lights for debugging this @leonardo-albertovich

fluent-bit has difficulty detecting new files?

leonardo-albertovich commented 1 year ago

Fluent-bit shouldn't have issues detecting new files, at least I never had that happen. Could you please share more information about your context @msolters?

I'm interested in knowing which platform is it, if you are using inotify, how many files are there in the paths where this is happening (or how many glob matchs), how does your configuation file look like and a copy of your log if possible.

Feel free to share things with me privately in slack if you don't feel comfortable doing it in public.

wanjunlei commented 1 year ago

environment.zip

This is some information about the cluster, including the yaml of the Fluent Bit daemonset, the configuration file of Fluent Bit, the version information of k8s and docker, the OS version, and kernel parameters. I hope this helps to reproduce the problem, If you need other information please let me know.

PS: The k8s version is v1.19.9. And I tried 1.6.9, 1.8.3, and 2.0.11, they have the same problem.

benjaminhuo commented 1 year ago

environment.zip

This is some information about the cluster, including the yaml of the Fluent Bit daemonset, the configuration file of Fluent Bit, the version information of k8s and docker, the OS version, and kernel parameters. I hope this helps to reproduce the problem, If you need other information please let me know.

@leonardo-albertovich @edsiper @agup006 @patrick-stephens we hope this can help to narrow down the investigation scope and to locate the root cause. Let us know what else you need

IbrahimMCode commented 1 year ago

Same issue using latest version, has this been resolved?

emmacz commented 1 year ago

Looks like 'scan_blog add(): dismissed' is generic message, when the root cause can be on different places like in parser etc.

It would be great to add more details in debug mode, which can help to user uncover the issue for example error message coming from parser etc.

Would it be possible?

In general .. where to find the debug information related to parsers processing?

benjaminhuo commented 1 year ago

Looks like 'scan_blog add(): dismissed' is generic message, when the root cause can be on different places like in parser etc.

It would be great to add more details in debug mode, which can help to user uncover the issue for example error message coming from parser etc.

Would it be possible?

In general .. where to find the debug information related to parsers processing?

@leonardo-albertovich what do you think? Is it possible to add more debug messages to help narrow the scope?

studyhuang1996 commented 1 year ago

image

fluent-bit version 2.1.6 has this been resolved?

benjaminhuo commented 1 year ago

image

fluent-bit version 2.1.6 has this been resolved?

@studyhuang1996 fluentbit v2.1.6 no longer skips any logs, issue resolved in your env that has this issue before?

dntosas commented 1 year ago

FYI team, issue is still present on v2.1.9 ^

dpallagolla commented 1 year ago

I am facing the same issue, one additional observation. The dismissed: log seems to appear only in the window when fluent-bit scans for new log files. Once the scan for files is done, logs are forwarded from fluent-bit.

praveenkumarp893 commented 1 year ago

I tried with latest version of fluentbit (2.1.10) Still facing the same error. Do we have any news on this ?

[2023/10/10 09:28:34] [debug] [input:tail:tail.0] scan_blog add(): dismissed:

Please see the input config used below.

inputs: | [INPUT] Name tail Path /var/log/containers/default.log multiline.parser docker, cri, java, multiline-regex-java Tag kube.* Mem_Buf_Limit 5MB Skip_Long_Lines Off

studyhuang1996 commented 1 year ago

这个问题一直存在,当然也不影响使用,只是不好排查问题,有一定的误解性,但是现在遇到一些问题会出现数据丢失,字段丢失还有一些多行匹配场景不满足,目前打算使用其他工具替换 fluent-bit

Benjamin Huo @.***> 于2023年8月31日周四 14:13写道:

[image: image] https://user-images.githubusercontent.com/27665214/264525540-e69a6936-8321-4e39-a5ee-757ca975ecc9.png

fluent-bit version 2.1.6 has this been resolved?

@studyhuang1996 https://github.com/studyhuang1996 fluentbit v2.1.6 no longer skips any logs, issue resolved in your env that has this issue before?

— Reply to this email directly, view it on GitHub https://github.com/fluent/fluent-bit/issues/4155#issuecomment-1700424901, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGTCGPUBFCRO36HI2CXGA3LXYATRPANCNFSM5FHYQJOA . You are receiving this because you were mentioned.Message ID: @.***>

omidraha commented 11 months ago

I have the same issue https://github.com/fluent/helm-charts/issues/415:

kubectl logs  -n fluent-bit -f fluent-bit-6341f45a-j8h2x | grep -i volume-test
[2023/10/27 17:04:05] [debug] [input:tail:tail.0] inode=1062251 with offset=2026 appended as /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log
[2023/10/27 17:04:05] [debug] [input:tail:tail.0] scan_glob add(): /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:04:05] [debug] [input:tail:tail.0] inode=1062251 file=/var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log promote to TAIL_EVENT
[2023/10/27 17:04:05] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1062251 watch_fd=23 name=/var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log
[2023/10/27 17:05:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:06:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:07:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:08:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
nulldoot2k commented 9 months ago

Still error: i using version latest and scan_blog add(): dismissed OMG image

nagyzekkyandras commented 9 months ago

same here with fluent/fluent-bit helm chart 0.42.0, and 2.2.1 image version

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] commented 5 months ago

This issue was closed because it has been stalled for 5 days with no activity.

benjaminhuo commented 5 months ago

@kc-dot-io reopen

lecaros commented 5 months ago

Hello. If you have this issue, please build your binary using this: https://github.com/lecaros/fluent-bit/tree/master I've added a few debug messages to narrow down the issue. Set the log_level to debug. Run your scenario and share the log file along with your configuration.

lecaros commented 5 months ago

I'll gladly reopen the ticket if you can provide the requested indo.

lorenzobenvenuti commented 2 months ago

Hello. If you have this issue, please build your binary using this: https://github.com/lecaros/fluent-bit/tree/master I've added a few debug messages to narrow down the issue. Set the log_level to debug. Run your scenario and share the log file along with your configuration.

Hi, we had a few scenarios where some logs weren't delivered (using Fluent Bit 2.2.2, "tail" input plugin and "splunk" output plugin). Looking at Fluent Bit logs I've noticed a scan_blog add(): dismissed so I thought it was related to the missing events and came here. I've followed your advice and rebuilt the executable from your fork (BTW, it's missing a couple of ;), but as far as I can see in our case dismissed was a red herring: it's printed even for legit use cases, for example

[2024/07/25 11:51:12] [debug] [in_tail] file /var/log/app/access.log already registered
[2024/07/25 11:51:12] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/app/access.log, inode 16582
1060

Basically after FluentBit starts monitoring a file it will always print `dismissed" because it doesn't need to register it again (although TBH the message is misleading, it makes you think that the file is not monitored anymore).

TL,DR: if you came here because you've lost some events and you're seeing dismissed in the logs, I'd recommend using the fork to print the root cause (although it'd be great if these info were available in the main development branch) because the actual issue could be unrelated.

Thanks!

AstritCepele commented 2 months ago

Can we reopen this issue? We see that this message is misleading, instead of the proper root cause issue.

srajappa commented 4 weeks ago

Still seeing this issue even for 3.1.8

@leonardo-albertovich Logs should paint an appropriate picture of what's wrong.

An abrupt event out of nowhere dismissed "your file" sends out panic. And when we see an issue opened several years ago and then coming to a conclusion that fluentbit will likely put out that message is frivolous.

contemplating switching to a different tool

OR

use the forked version