fluent / fluent-operator

Operate Fluent Bit and Fluentd in the Kubernetes way - Previously known as FluentBit Operator
Apache License 2.0
578 stars 246 forks source link

help request: Fluent-Bit installed by fluent-operator is not working #1124

Open vi4vikas opened 5 months ago

vi4vikas commented 5 months ago

Describe the issue

I have installed fluent-bit (v2.2.0) in my cluster via fluent-operator (v2.7.0). But when the fluent-bit pods come up, they don't transport any logs. The only logs that I can see are container CPU usage:

[0] cpu.local: [[1713177982.122580171, {}], {"cpu_p"=>6.000000, "user_p"=>3.000000, "system_p"=>3.000000, "cpu0.p_cpu"=>6.000000, "cpu0.p_user"=>2.000000, "cpu0.p_system"=>4.000000, "cpu1.p_cpu"=>4.000000, "cpu1.p_user"=>2.000000, "cpu1.stem"=>2.000000, "cpu2.p_cpu"=>6.000000, "cpu2.p_user"=>3.000000, "cpu2.p_system"=>3.000000, "cpu3.p_cpu"=>7.000000, "cpu3.p_user"=>3.000000, "cpu3.p_system"=>4.000000}]
[0] cpu.local: [[1713177983.122599012, {}], {"cpu_p"=>3.250000, "user_p"=>2.250000, "system_p"=>1.000000, "cpu0.p_cpu"=>5.000000, "cpu0.p_user"=>4.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>3.000000, "cpu1.p_user"=>2.000000, "cpu1.stem"=>1.000000, "cpu2.p_cpu"=>3.000000, "cpu2.p_user"=>2.000000, "cpu2.p_system"=>1.000000, "cpu3.p_cpu"=>3.000000, "cpu3.p_user"=>2.000000, "cpu3.p_system"=>1.000000}]
[0] cpu.local: [[1713177984.122618183, {}], {"cpu_p"=>2.750000, "user_p"=>1.750000, "system_p"=>1.000000, "cpu0.p_cpu"=>2.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>4.000000, "cpu1.p_user"=>3.000000, "cpu1.stem"=>1.000000, "cpu2.p_cpu"=>4.000000, "cpu2.p_user"=>3.000000, "cpu2.p_system"=>1.000000, "cpu3.p_cpu"=>3.000000, "cpu3.p_user"=>2.000000, "cpu3.p_system"=>1.000000}]
[0] cpu.local: [[1713177985.126885659, {}], {"cpu_p"=>5.000000, "user_p"=>2.750000, "system_p"=>2.250000, "cpu0.p_cpu"=>4.000000, "cpu0.p_user"=>2.000000, "cpu0.p_system"=>2.000000, "cpu1.p_cpu"=>5.000000, "cpu1.p_user"=>2.000000, "cpu1.stem"=>3.000000, "cpu2.p_cpu"=>4.000000, "cpu2.p_user"=>2.000000, "cpu2.p_system"=>2.000000, "cpu3.p_cpu"=>5.000000, "cpu3.p_user"=>4.000000, "cpu3.p_system"=>1.000000}]
[0] cpu.local: [[1713177986.122618856, {}], {"cpu_p"=>5.000000, "user_p"=>4.000000, "system_p"=>1.000000, "cpu0.p_cpu"=>2.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>10.000000, "cpu1.p_user"=>8.000000, "cpu1.p_system"=>2.000000, "cpu2.p_cpu"=>3.000000, "cpu2.p_user"=>2.000000, "cpu2.p_system"=>1.000000, "cpu3.p_cpu"=>5.000000, "cpu3.p_user"=>4.000000, "cpu3.p_system"=>1.000000}]
[0] cpu.local: [[1713177987.122631682, {}], {"cpu_p"=>2.250000, "user_p"=>1.000000, "system_p"=>1.250000, "cpu0.p_cpu"=>2.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>3.000000, "cpu1.p_user"=>2.000000, "cpu1.stem"=>1.000000, "cpu2.p_cpu"=>3.000000, "cpu2.p_user"=>1.000000, "cpu2.p_system"=>2.000000, "cpu3.p_cpu"=>2.000000, "cpu3.p_user"=>1.000000, "cpu3.p_system"=>1.000000}]

Logs in fluent-operator pods are as:

setenv + echo 'CONTAINER_ROOT_DIR=/var/log'
fluent-operator I0415 12:28:53.331314       1 request.go:690] Waited for 1.0422717s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/coordination.k8s.io/v1?timeout=32s
fluent-operator 2024-04-15T12:28:54Z    INFO    controller-runtime.metrics    Metrics server is starting to listen    {"addr": ":8080"}
fluent-operator 2024-04-15T12:28:54Z    INFO    setup    starting manager
fluent-operator 2024-04-15T12:28:54Z    INFO    Starting server    {"path": "/metrics", "kind": "metrics", "addr": ":8080"}
fluent-operator 2024-04-15T12:28:54Z    INFO    Starting server    {"kind": "health probe", "addr": ":8081"}
Stream closed EOF for logging/fluent-operator-5bf7f5cfcb-www4b (setenv)
Stream closed EOF for logging/fluent-operator-5bf7f5cfcb-www4b (fluent-operator)                                                                                                                                                                

How did you install fluent operator?

I have installed fluent-operator by this helm chart. Fluent bit configuration that I am passing in are as:

    input:
      tail:
        enable: true
        bufferChunkSize: 1MB
        bufferMaxSize: 1MB
        db: /var/log/flb_kube.db
        excludePath: /var/log/containers/fluentd-cloudwatch*.log,/var/log/containers/snapshot-cb*.log,/var/log/containers/snapshotcreatorrunner*.log
        memBufLimit: 100MB
        multilineParser: cri, docker
        path: "/var/log/containers/*.log"
        pauseOnChunksOverlimit: "off"
        readFromHead: false
        refreshIntervalSeconds: 10
        skipLongLines: true
        storageType: filesystem
        tag: "kube.*"

    output:
      s3:
        Bucket: dev-logs  # Cluster Specific
        CannedAcl: bucket-owner-full-control
        Compression: gzip
        Region: eu-west-1
        RetryLimit: 3
        S3KeyFormat: /logs/%Y/%m/%d/%H/$UUID.gz   # Cluster Specific
        S3KeyFormatTagDelimiters: .-
        StoreDirLimitSize: 2G
        TotalFileSize: 5M
        UploadChunkSize: 1m
        UsePutObject: true

    service:
      daemon: false
      flushSeconds: 10
      healthCheck: true
      httpListen: 0.0.0.0
      httpPort: 2020
      httpServer: true
      logLevel: info
      storage:
        backlogMemLimit: 300Mi
        checksum: "on"
        deleteIrrecoverableChunks: "on"
        maxChunksUp: 1000
        metrics: "on"
        path: "/var/log/fluent-bit-buffer/"
        sync: "full"

    filter:
      kubernetes:
        annotations: false
        bufferSize: "0MB"
        cacheUseDockerId: true
        k8sLoggingExclude: true
        k8sLoggingParser: true
        keepLog: true
        kubeCAFile: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        kubeTagPrefix: "kube.var.log.containers."
        kubeTokenFile: "/var/run/secrets/kubernetes.io/serviceaccount/token"
        kubeURL: "https://kubernetes.default.svc:443"
        labels: true
        match: kube.*
        mergeLog: true
        mergeLogKey: log4j
      rewriteTag:
        emitterName: ns_emitter
        rules:
        - "$kubernetes['namespace_name'] ^.*$"
        - "$kubernetes['namespace_name'] false"

Additional context

Other configurations are as:

fluent-operator:
  # Configurations for Fluent Operator
  containerRuntime: containerd
  operator:
    initcontainer:
      repository: <docker_image>
      tag: 20.10.8
      resources:
        limits:
          memory: 64Mi
        requests:
          cpu: 50m
          memory: 64Mi

    container:
      repository: "kubesphere/fluent-operator"
      tag: v2.7.0

    priorityClassName: "core-pod-low-priority"

    resources:
      limits:
        memory: 500Mi
      requests:
        cpu: 100m
        memory: 500Mi

    annotations:
      dynamo.certificate.tls/active: "false"

    labels:
      k8s-app: fluent-operator
      version: v1
      kubernetes.io/cluster-service: "true"

    disableComponentControllers: "fluentd"

  # Configurations for Fluent-Bit
  fluentbit:
    image:
      repository: "kubesphere/fluent-bit"
      tag: 2.2.0

    resources:
      limits:
        memory: 300Mi
      requests:
        cpu: 400m
        memory: 300Mi

    labels:
      k8s-app: fluent-bit
      version: v1
      kubernetes.io/cluster-service: "true"

    additionalVolumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

    additionalVolumesMounts:
      - name: varlog
        mountPath: /var/log
      - name: varlibdockercontainers
        mountPath: /var/lib/docker/containers
        readOnly: true

    tolerations:
      - key: node-role.kubernetes.io/master
        operator: "Exists"
        effect: "NoSchedule"
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

    priorityClassName: "core-pod-low-priority"

  fluentd:
    crdsEnable: false
    enable: false

  nameOverride: ""
  fullnameOverride: ""
  namespaceOverride: "logging"

Please let me know if any more detail is required.

SvenThies commented 2 months ago

@vi4vikas is this still an issue? Also with an update on the most recent version? Otherwise, can the issue be closed?