fluent / helm-charts

Helm Charts for Fluentd and Fluent Bit
Apache License 2.0
366 stars 438 forks source link

3 replicas in Deployment collecting same k8s events (duplication) #520

Open dianakutca opened 3 weeks ago

dianakutca commented 3 weeks ago

Bug Report

Describe the bug We are trying to set up Fluent Bit in a high-availability configuration with 3 replicas. Each pod is processing the same Kubernetes events, leading to duplication of events when sending them to S3. No available filter for deduplication

To Reproduce

  1. Deploy Fluent Bit as a Deployment with 3 replicas.
  2. Configure the input plugin for Kubernetes events as described in the documentation.
  3. Configure the output plugin to send events to S3.

    Configuration files: server, input, filters and output

    
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: {{ .Values.name }}
    namespace: {{ .Values.namespace }}
    labels:
    app: {{ .Values.appName }}
    spec:
    replicas: {{ .Values.replicaCount }}
    selector:
    matchLabels:
      app: {{ .Values.appName }}
    template:
    metadata:
      labels:
        app: {{ .Values.appName }}
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      {{- if .Values.fluentbit.tolerations.enable }}
      tolerations:
      {{- toYaml .Values.fluentbit.tolerations.items | nindent 6 }}
      {{- end }}
      serviceAccountName: {{ .Values.serviceAccount.name }}
      initContainers:
      - name: fmeta-list
        image: "{{ .Values.fmeta.image.repository }}:{{ .Values.fmeta.image.tag }}"
        command:       ["python", "/src/get_pods.py", "list"]
        imagePullPolicy: Always
        env: 
        - name: META_DIR
          value: "{{ .Values.fluentbit.env.metadata_dir }}"
        - name: POD_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        volumeMounts:
        - mountPath: /mnt/fmeta
          name: meta-events
      containers:
      - name: fluent-bit
        image: "{{- if .Values.fluentbit.enable_debug }}{{ .Values.fluentbit.image.repository }}:{{ .Values.fluentbit.image.tag_debug }}{{- else }}{{ .Values.fluentbit.image.repository }}:{{ .Values.fluentbit.image.tag }}{{- end }}"
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 2020
          protocol: TCP
        env:
        - name: FLUENT_CODE_CHANGE_DATE
          value: "{{ .Values.fluentbit.env.code_change_date}}"
        - name: FLUENT_OS_LOCATION
          value: "{{ .Values.fluentbit.env.object_storage_location}}"
        - name: FLUENT_OS_BUCKET
          value: "{{ .Values.fluentbit.env.object_storage_bucket}}"
        - name: FLUENT_OS_REGION
          value: "{{ .Values.fluentbit.env.object_storage_region}}"
        - name: AWS_ACCESS_KEY_ID
          value:  "{{ .Values.fluentbit.env.object_storage_key}}"
        - name: AWS_SECRET_ACCESS_KEY
          value:  "{{ .Values.fluentbit.env.object_storage_secret}}"
        - name: FLUENT_OS_UPLOAD_TIMEOUT
          value: "{{ .Values.fluentbit.env.object_storage_upload_timeout}}"
        - name: FLUENT_META_CACHE_DIR
          value: "{{ .Values.fluentbit.env.metadata_dir }}"
        - name: FLUENT_MEMBUFLIMIT
          value: "{{ .Values.fluentbit.env.memory_buffer_limit }}"
        - name: FLUENT_REFRESH_INTERVAL
          value: "{{ .Values.fluentbit.env.refresh_interval }}"
        - name: EMPTYDIR_PATH
          value: "{{ .Values.kubernetes.env.emptydir_dir_location }}"
        - name: PVC_PATH
          value: "{{ .Values.kubernetes.env.pvc_dir_location }}"
        - name: CLUSTER_NAME
          value: "{{ .Values.kubernetes.env.cluster_name }}"
        - name: FLUENT_TAIL_DB
          value:  "{{ .Values.fluentbit.env.tail_db_file }}"
        resources:
          limits:
            cpu: "{{ .Values.fluentbit.limits.cpu }}"
            memory: "{{ .Values.fluentbit.limits.memory }}"
            ephemeral-storage: "{{ .Values.fluentbit.limits.storage }}"
          requests:
            cpu: "{{ .Values.fluentbit.requests.cpu }}"
            memory: "{{ .Values.fluentbit.requests.memory }}"
            ephemeral-storage: "{{ .Values.fluentbit.requests.storage }}"
        volumeMounts:
        - mountPath: /mnt/fmeta
          name: meta-events
        - mountPath: /fluent-bit/etc/
          name: {{ .Values.configMapName }}-config
        - name: varlibpath-events
          mountPropagation: HostToContainer
          mountPath: "{{ .Values.fluentbit.mount_path }}"
          readOnly: true
        - name: flb-db-path-events
          mountPath: /mnt/var/events   
      - name: fmeta-watch
        image: "{{ .Values.fmeta.image.repository }}:{{ .Values.fmeta.image.tag }}"
        command:       ["python", "/src/get_pods.py", "watch"]
        imagePullPolicy: Always
        env: 
        - name: META_DIR
          value: "{{ .Values.fluentbit.env.metadata_dir }}"
        - name: POD_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          privileged: true
        resources:
          limits:
            cpu: "{{ .Values.fmeta.limits.cpu }}"
            memory: "{{ .Values.fmeta.limits.memory }}"
            ephemeral-storage: "{{ .Values.fmeta.limits.storage }}"
          requests:
            cpu: "{{ .Values.fmeta.requests.cpu }}"
            memory: "{{ .Values.fmeta.requests.memory }}"          
            ephemeral-storage: "{{ .Values.fmeta.requests.storage }}"    
        volumeMounts:
        - mountPath: /mnt/fmeta
          name: meta-events
        - name: varlibpath-events
          mountPropagation: HostToContainer
          mountPath: "{{ .Values.fluentbit.mount_path }}"
          readOnly: true
      volumes:
      - name: meta-events
        emptyDir: {}
      - configMap:
          defaultMode: 420
          name: {{ .Values.configMapName }}-config
        name: {{ .Values.configMapName }}-config
      - name: varlibpath-events
        hostPath:
          path: "{{ .Values.kubernetes.env.var_lib_location }}"
      - name: flb-db-path-events
        hostPath:
          path: "{{ .Values.kubernetes.env.flb_db_path }}"

apiVersion: v1 kind: ConfigMap metadata: name: {{ .Values.configMapName }}-config namespace: {{ .Values.namespace }} labels: app: {{ .Values.appName }} data: fluent-bit.conf: | [SERVICE] Flush 5 Grace 20 {{- if .Values.fluentbit.enable_debug }} Log_Level debug {{- else }} Log_Level debug {{- end }} Daemon off Parsers_File parsers.conf HTTP_Server Off HTTP_Listen 0.0.0.0 HTTP_Port 2020 @INCLUDE input-kubernetes.conf @INCLUDE filter-kubernetes.conf @INCLUDE output-s3.conf

input-kubernetes.conf: | [INPUT] name kubernetes_events tag k8s_events kube_url https://kubernetes.default.svc db /var/log/event.db DB.Sync normal

filter-kubernetes.conf: |

Section Kubernetes specific filter, DO NOT CHANGE

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Annotations         Off
    Merge_Log           Off
    Labels              Off
    K8S-Logging.Parser  On
    K8S-Logging.Exclude On
    Kube_meta_preload_cache_dir  ${FLUENT_META_CACHE_DIR}
    Regex_Parser        custom-kube-filter
    Kube_Tag_Prefix     kube.
    tls.verify          Off

output-s3.conf: | [OUTPUT] Name s3 Match * Bucket ${FLUENT_OS_BUCKET} Region ${FLUENT_OS_REGION} Endpoint ${FLUENT_OS_LOCATION} Total_File_Size 90M Store_Dir_Limit_Size 200M

Compression gzip

    S3_Key_Format                /events/queue/%Y/%m/%d/%H/$UUID.json
    S3_Key_Format_Tag_Delimiters .-
    Upload_Timeout               ${FLUENT_OS_UPLOAD_TIMEOUT}
    Auto_Retry_Requests          True
    Workers                      1
    Use_Put_Object               True


**Expected behavior**
Fluent Bit should ensure that events are processed only once and sent to S3 without duplication.

**Screenshots**
<img width="1350" alt="image" src="https://github.com/fluent/fluent-bit/assets/128414402/ded5f554-69df-4f1e-aac9-01a557f41e71">

**Your Environment**
<!--- Include as many relevant details about the environment you experienced the bug in -->
* Version used: 2.2
* Environment name: Kubernetes
* Filters and plugins: grep, lua, etc

Anything i am missing ?