fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.73k stars 1.56k forks source link

[error] [tls] error: unexpected EOF in versions after 2.0.6 #8452

Open xogoodnow opened 7 months ago

xogoodnow commented 7 months ago

Bug Report

Describe the bug On fluent bit versions after 2.0.6 when using es as output and enabling TLS you get the following error

[2024/02/03 19:34:06] [error] [tls] error: unexpected EOF [2024/02/03 19:34:06] [ warn] [engine] failed to flush chunk '1-1706988835.417884252.flb', retry in 8 seconds: task_id=17, input=kafka.0 > output=es.0 (out_id=0)

the problem persists on all versions after 2.0.6

Expected behavior To properly send out data into Elasticsearch

Your Environment

Additional context here is my config

[SERVICE] Flush 1 Daemon Off Log_Level info Parsers_File parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 Health_Check On HTTP_Port 2021

[FILTER] Name record_modifier Match * Record hostname ${HOSTNAME}

[INPUT] Name kafka Tag kafka_log brokers kafka-1:9092, kafka-2:9092, kafka-3:9092 topics fluent-bit-auth-log rdkafka.bootstrap.servers kafka-1:9092, kafka-2:9092, kafka-3:9092 rdkafka.log_level 7 rdkafka.security.protocol sasl_ssl rdkafka.ssl.keystore.location Kafka/Config/Certs/kafka.server.keystore-first-1.jks rdkafka.ssl.keystore.password kafkacertpass rdkafka.enable.ssl.certificate.verification false rdkafka.sasl.mechanisms PLAIN rdkafka.sasl.username kafka-client rdkafka.sasl.password kafka-password rdkafka.socket.keepalive.enable false rdkafka.socket.nagle.disable false rdkafka.socket.max.fails 3 rdkafka.broker.address.ttl 10000 rdkafka.broker.address.family any rdkafka.connections.max.idle.ms 1000 rdkafka.reconnect.backoff.ms 100 rdkafka.reconnect.backoff.max.ms 30000 rdkafka.ssl.endpoint.identification.algorithm https rdkafka.fetch.message.max.bytes 5048576 rdkafka.fetch.max.bytes 102428800 rdkafka.check.crcs false rdkafka.enable.idempotence false rdkafka.message.send.max.retries 2147483647 rdkafka.retry.backoff.ms 100 rdkafka.retry.backoff.max.ms 3000

[OUTPUT] Name es Match * Host elastic-1 Port 9200 tls On tls.verify Off # Set to On if you want to verify the server's certificate tls.debug 4 Logstash_Format On Logstash_Prefix demo-elastic-elastic Time_Key @timestamp Time_Key_Format %Y-%m-%dT%H:%M:%S.%L Replace_Dots On Trace_Output On Trace_Error On Retry_Limit False HTTP_User elastic HTTP_Passwd fancyliqurebottle Suppress_Type_Name On

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

xogoodnow commented 4 months ago

The issue still persists even on the latest image

geru-scotland commented 2 months ago

Hello there,

I am experiencing the same issue, when I enable TLS in Fluent-bit ConfigMap, I experience the following error:

[2024/07/04 16:51:57] [error] [tls] error: unexpected EOF
[2024/07/04 16:51:57] [ warn] [engine] failed to flush chunk '1-1720111186.704158600.flb', retry in 84 seconds: task_id=47, input=tail.0 > output=es.0 (out_id=0)
[2024/07/04 16:51:58] [error] [tls] error: unexpected EOF
[2024/07/04 16:51:58] [error] [tls] error: unexpected EOF
[2024/07/04 16:51:58] [ warn] [engine] failed to flush chunk '1-1720111346.704169721.flb', retry in 612 seconds: task_id=208, input=tail.0 > output=es.0 (out_id=0)
[2024/07/04 16:51:58] [ warn] [engine] failed to flush chunk '1-1720111588.707272575.flb', retry in 68 seconds: task_id=450, input=tail.0 > output=es.0 (out_id=0)

Note that TLS works perfectly between Kibana and Elasticsearch, so their configuration seems to be fine.

Now, I have the following config for Fluent-bit for the output plugin:

output-elasticsearch.conf: |
    [OUTPUT]
        Name              es
        Match             *
        Host              ${FLUENT_ELASTICSEARCH_HOST}
        Port              ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format   Off
        Replace_Dots      On
        Retry_Limit       False
        Suppress_Type_Name On
        tls             On
        tls.verify      Off
        tls.debug       3
        tls.ca_file     /fluent-bit/tls/tls.crt
        tls.crt_file    /fluent-bit/tls/tls.crt
        tls.key_file    /fluent-bit/tls/tls.key    
        HTTP_User       ${FLUENT_ELASTICSEARCH_USER}
        HTTP_Passwd     ${FLUENT_ELASTICSEARCH_PASSWORD}

This, in order to mount the certificates, in my daemonSet.yaml:

spec:
  containers:
    - name: fluent-bit
      image: fluent/fluent-bit:3.0.7 # Latest
      imagePullPolicy: Always
      ports:
        - containerPort: 2020
      env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_USER # Una vez funcione, cargar con secreto
          value: "kibana_system" # I am taking advantage of kibana's user
        - name: FLUENT_ELASTICSEARCH_PASSWORD
          value: "herethepassword"
      volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: tls-certs
          mountPath: /fluent-bit/tls
  terminationGracePeriodSeconds: 10
  volumes:
    - name: varlog
      hostPath:
        path: /var/log
    - name: varlibdockercontainers
      hostPath:
        path: /var/lib/docker/containers
    - name: fluent-bit-config
      configMap:
        name: fluent-bit-config
    - name: tls-certs
      secret:
        secretName: elasticsearch-certs

The elasticsearch-certs is the same that I am currently using with elasticsearch, which is working for the Kibana-elastic communication.

Also, as said, Kibana works, but is doing the request via URL, not really sure if this means something:

spec:
  containers:
    - name: kibana
      image: docker.elastic.co/kibana/kibana:8.14.1 # Ojo. Same, exactly version than elastic search!!
      resources:
        limits:
          cpu: 1000m
          memory: 1Gi
        requests:
          cpu: 700m
          memory: 1Gi
      env:
        - name: ELASTICSEARCH_URL
          value: https://elastic-search.basajaun-cluster:30738
          # SERVICE_TOKEN elastic/kibana/kibana-system = AAEAAWVsYXN0aWMva2liYW5hL2tpYmFuYS1zeXN0ZW06VnNGcmF0TlJUY1dGYkRla01OekVvZw
        - name: ELASTICSEARCH_USERNAME
          value: "kibana_system"
        - name: ELASTICSEARCH_PASSWORD
          value: "thepassword"
        - name: ELASTICSEARCH_SSL_VERIFICATIONMODE
          value: "none" # Solo en entornos de desarrollo, para producción usar "full"
      ports:
        - containerPort: 5601

Any help is appreciated.

geru-scotland commented 2 months ago

In case someone experiences this issue in the future, I have found the issue.

In my cluster, I use the Kubernetes Gateway API (with an NGINX controller that implements it), which has TLS termination. When traffic is received, it is decrypted and routed at the HTTP layer. Kibana gets the users and roles from Elasticsearch, which uses xpack for authentication and other security features. Therefore, the transport layer needs to be secured with TLS. However, since it's within the cluster, HTTP layer security can be disabled, which aligns with my cluster's internal requirements.

The problem was that if Fluent Bit needs to send logs to Elasticsearch via TLS, Elasticsearch must have HTTP security set to true. I didn't have this part configured, so Fluent Bit was giving the error I mentioned. It's possible that Kibana may now give errors if it's not properly set up for such connections, so it's important to ensure that requests are made at the inter-cluster service level, instead of through an external URL, as the reverse proxy with transform HTTPS into HTTP.

Therefore, at least in my case, since I have everything under TLS termination upon entering into the cluster through the Gateway API, I can just safely route all internal traffic via HTTP.