fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.7k stars 1.55k forks source link

not able to upload logs from Fluentbit to OTelcollector #9117

Closed RitikaLaddha closed 1 month ago

RitikaLaddha commented 1 month ago

Hi, I want to upload logs from fluent bit to open telemetry collector on AKS cluster

So, I have installed otel collector as deployment from here and fluentbit from here

This is my fluentbit configration for output to OTel

 [OUTPUT]
        Name                 opentelemetry
        Match                *
        Host                 http://my-opentelemetry-collector.namespace.svc.cluster.local
        Port                 4318
        Metrics_uri          /v1/metrics
        Logs_uri             /v1/logs
        Traces_uri           /v1/traces
        Log_response_payload True
        Tls                  On
        Tls.verify           Off

I have also added environment variable in fluentbit Daemonset `env:

But I am getting below error, in my fluentbit logs

[2024/07/21 17:13:15] [ warn] [net] getaddrinfo(host='http://my-opentelemetry-collector.dis.svc.cluster.local', err=15): Out of memory
[2024/07/21 17:13:15] [error] [output:opentelemetry:opentelemetry.1] no upstream connections available to http://my-opentelemetry-collector.dis.svc.cluster.local:4318
[2024/07/21 17:13:15] [ warn] [engine] failed to flush chunk '1-1721581994.699105145.flb', retry in 7 seconds: task_id=4, input=tail.0 > output=opentelemetry.1 (out_id=1)

Do I need to make any configuration change in Otel configmap or fluentbit to make this work? Below are the configmaps of fluentbit and oTel

Fluentbit Configmap

apiVersion: v1
items:
- apiVersion: v1
  data:
    custom_parsers.conf: |
      [PARSER]
          Name docker_no_time
          Format json
          Time_Keep Off
          Time_Key time
          Time_Format %Y-%m-%dT%H:%M:%S.%L
    fluent-bit.conf: |
      [SERVICE]
          Daemon Off
          Flush 1
          Log_Level info
          Parsers_File /fluent-bit/etc/parsers.conf
          Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
          HTTP_Server On
          HTTP_Listen 0.0.0.0
          HTTP_Port 2020
          Health_Check On

      [INPUT]
          Name tail
          Path /var/log/containers/*.log
          multiline.parser docker, cri
          Tag kube.*
          Mem_Buf_Limit 5MB
          Skip_Long_Lines On

      [INPUT]
          Name systemd
          Tag host.*
          Systemd_Filter _SYSTEMD_UNIT=kubelet.service
          Read_From_Tail On

      [FILTER]
          Name kubernetes
          Match kube.*
          Merge_Log On
          Keep_Log Off
          K8S-Logging.Parser On
          K8S-Logging.Exclude On

      [OUTPUT]
          Name                 opentelemetry
          Match                *
          Host                 http://my-opentelemetry-collector.dis.svc.cluster.local
          Port                 4318
          Metrics_uri          /v1/metrics
          Logs_uri             /v1/logs
          Traces_uri           /v1/traces
          Log_response_payload True
          Tls                  Off
          Tls.verify           Off
          logs_body_key $message
          logs_span_id_message_key span_id
          logs_trace_id_message_key trace_id
          logs_severity_text_message_key loglevel
          logs_severity_number_message_key lognum
          # add user-defined labels
          add_label            app fluent-bit
          add_label            color blue
  kind: ConfigMap
  metadata:
    annotations:
      meta.helm.sh/release-name: fluent-bit
      meta.helm.sh/release-namespace: dis
    labels:
      app.kubernetes.io/instance: fluent-bit
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: fluent-bit
      app.kubernetes.io/version: 3.1.3
      helm.sh/chart: fluent-bit-0.47.3
    name: fluent-bit
    namespace: dis

OTEL Configmap

- apiVersion: v1
  data:
    relay: |
      exporters:
        otlp/uptrace:
          endpoint: otlp.uptrace.dev:4317
          headers:
            uptrace-dsn: 'https://mOMzBlbLofrg@api.uptrace.dev?grpc=4317'
      extensions:
        health_check:
          endpoint: ${env:MY_POD_IP}:13133
      processors:
        batch: {}
        memory_limiter:
          check_interval: 5s
          limit_percentage: 80
          spike_limit_percentage: 25
      receivers:
        otlp:
          protocols:
            http:
              endpoint: ${env:MY_POD_IP}:4318
      service:
        extensions:
        - health_check
        pipelines:
          logs:
            exporters:
            - otlp/uptrace
            processors:
            - memory_limiter
            - batch
            receivers:
            - otlp
          metrics:
            exporters:
            - otlp/uptrace
            processors:
            - memory_limiter
            - batch
            receivers:
            - otlp
          traces:
            exporters:
            - otlp/uptrace
            processors:
            - memory_limiter
            - batch
            receivers:
            - otlp
        telemetry:
          metrics:
            address: ${env:MY_POD_IP}:8888
  kind: ConfigMap
  metadata:
    annotations:
      meta.helm.sh/release-name: my-opentelemetry-collector
      meta.helm.sh/release-namespace: dis
    labels:
      app.kubernetes.io/instance: my-opentelemetry-collector
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: opentelemetry-collector
      app.kubernetes.io/version: 0.104.0
      helm.sh/chart: opentelemetry-collector-0.97.2
    name: my-opentelemetry-collector
    namespace: dis
RitikaLaddha commented 1 month ago

Hi @edsiper Sorry I didn't know whom to tag, could you please align anyone on this question?

I am trying to achieve below flow, would be very grateful for any guidance

Logs -> Fluentbit -> OTel Collector -> Uptrace
                                    -> Jaeger
                                    -> etc..
edsiper commented 1 month ago

@RitikaLaddha thanks for opening this issue.

The first time I see getaddrinfo() running out of memory/capacity, it seems to be related to a large number of DNS options for your remote endpoint.

can you try adding the following option inside the [SERVICE] section ?

dns.mode LEGACY

ref: https://docs.fluentbit.io/manual/administration/networking#dns-mode

RitikaLaddha commented 1 month ago

Thanks for replying Sure, let me try

RitikaLaddha commented 1 month ago

still getting same error image

leonardo-albertovich commented 1 month ago

Could you try removing the http:// part of the host? I just tested it to be sure and it seems that fluent-bit expects the host setting to be a regular hostname and doesn't parse URLs.

leonardo@lima-default:/Users/leonardo/Work/Calyptia/fluent-bit/build$ cat > issue_9117.yaml <<__EOF__
service:
  log_level: info

pipeline:
  inputs:
    - name: dummy
      samples: 1

  outputs:
    - name: opentelemetry
      match: '*'
      host: http://localhost
      port: 9999
      tls: off
__EOF__

leonardo@lima-default:/Users/leonardo/Work/Calyptia/fluent-bit/build$ ./bin/fluent-bit -c issue_9117.yaml
Fluent Bit v3.1.5
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/07/31 13:49:08] [ info] [fluent bit] version=3.1.5, commit=6df1f2bf7c, pid=253081
[2024/07/31 13:49:08] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/07/31 13:49:08] [ info] [cmetrics] version=0.9.1
[2024/07/31 13:49:08] [ info] [ctraces ] version=0.5.2
[2024/07/31 13:49:08] [ info] [input:dummy:dummy.0] initializing
[2024/07/31 13:49:08] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2024/07/31 13:49:08] [ info] [sp] stream processor started
[2024/07/31 13:49:10] [ warn] [net] getaddrinfo(host='http://localhost', err=15): Out of memory
[2024/07/31 13:49:10] [error] [output:opentelemetry:opentelemetry.0] no upstream connections available to http://localhost:9999
[2024/07/31 13:49:10] [ warn] [engine] failed to flush chunk '253081-1722426549.431338198.flb', retry in 7 seconds: task_id=0, input=dummy.0 > output=opentelemetry.0 (out_id=0)
^C[2024/07/31 13:49:11] [engine] caught signal (SIGINT)
[2024/07/31 13:49:11] [ warn] [engine] service will shutdown in max 5 seconds
[2024/07/31 13:49:11] [ info] [input] pausing dummy.0
[2024/07/31 13:49:11] [ warn] [net] getaddrinfo(host='http://localhost', err=15): Out of memory
[2024/07/31 13:49:11] [error] [output:opentelemetry:opentelemetry.0] no upstream connections available to http://localhost:9999
[2024/07/31 13:49:11] [error] [engine] chunk '253081-1722426549.431338198.flb' cannot be retried: task_id=0, input=dummy.0 > output=opentelemetry.0
[2024/07/31 13:49:12] [ info] [engine] service has stopped (0 pending tasks)
[2024/07/31 13:49:12] [ info] [input] pausing dummy.0

leonardo@lima-default:/Users/leonardo/Work/Calyptia/fluent-bit/build$ cat > issue_9117.yaml <<__EOF__
service:
  log_level: info

pipeline:
  inputs:
    - name: dummy
      samples: 1

  outputs:
    - name: opentelemetry
      match: '*'
      host: localhost
      port: 9999
      tls: off
__EOF__

leonardo@lima-default:/Users/leonardo/Work/Calyptia/fluent-bit/build$ ./bin/fluent-bit -c issue_9117.yaml
Fluent Bit v3.1.5
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/07/31 13:49:23] [ info] [fluent bit] version=3.1.5, commit=6df1f2bf7c, pid=253091
[2024/07/31 13:49:23] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/07/31 13:49:23] [ info] [cmetrics] version=0.9.1
[2024/07/31 13:49:23] [ info] [ctraces ] version=0.5.2
[2024/07/31 13:49:23] [ info] [input:dummy:dummy.0] initializing
[2024/07/31 13:49:23] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2024/07/31 13:49:23] [ info] [sp] stream processor started
[2024/07/31 13:49:24] [error] [output:opentelemetry:opentelemetry.0] no upstream connections available to localhost:9999
[2024/07/31 13:49:24] [ warn] [engine] failed to flush chunk '253091-1722426563.435245064.flb', retry in 10 seconds: task_id=0, input=dummy.0 > output=opentelemetry.0 (out_id=0)
^C[2024/07/31 13:49:25] [engine] caught signal (SIGINT)
[2024/07/31 13:49:25] [ warn] [engine] service will shutdown in max 5 seconds
[2024/07/31 13:49:25] [ info] [input] pausing dummy.0
[2024/07/31 13:49:25] [error] [output:opentelemetry:opentelemetry.0] no upstream connections available to localhost:9999
[2024/07/31 13:49:25] [error] [engine] chunk '253091-1722426563.435245064.flb' cannot be retried: task_id=0, input=dummy.0 > output=opentelemetry.0
[2024/07/31 13:49:25] [ info] [engine] service has stopped (0 pending tasks)
[2024/07/31 13:49:25] [ info] [input] pausing dummy.0
leonardo@lima-default:/Users/leonardo/Work/Calyptia/fluent-bit/build$
RitikaLaddha commented 1 month ago

This worked for me Thank you so much for help @leonardo-albertovich @edsiper image