fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.53k stars 1.51k forks source link

Envoy dropping connections to in_opentelemetry #8742

Open edsiper opened 2 months ago

edsiper commented 2 months ago

Bug Report

Doing a local test between Envoy and Fluent Bit upstream, Envoy cannot succeed in the GRPC session giving a 14 error.

The following is the Envoy configuration being used:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: AUTO
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                direct_response:
                  status: 200
                  body:
                    inline_string: "Hello, World!"
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          tracing:
            provider:
              name: envoy.tracers.opentelemetry
              typed_config:
                "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
                grpc_service:
                  envoy_grpc:
                    cluster_name: otel-collector
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog

  clusters:
  - name: otel-collector
    connect_timeout: 1s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          validation_context:
            # This configures the context to not verify the peer certificate.
            trust_chain_verification: ACCEPT_UNTRUSTED
    http2_protocol_options: {}
    load_assignment:
      cluster_name: otel-collector
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 4317

Envoy usage

Run it with the following command:

envoy -c envoy-traces.yaml --log-level debug

Fluent Bit

Run it locally with (inside fluent-bit/build):

bin/fluent-bit -i opentelemetry 
          -p port=4317 
          -p tls=on 
          -p tls.verify=off 
          -p tls.crt_file=../tests/runtime_shell/tls/certificate.pem 
          -p tls.key_file=../tests/runtime_shell/tls/private_key.pem 
          -p tls.debug=5 
      -o stdout -vv

Envoy fails with the following information:

[2024-04-19 15:19:55.367][25505973][debug][router] [source/common/router/router.cc:1332] [Tags: "ConnectionId":"0","StreamId":"16569461505730323531"] upstream reset: reset reason: connection timeout, transport failure reason:
[2024-04-19 15:19:55.367][25505973][debug][http] [source/common/http/async_client_impl.cc:106] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection timeout'
edsiper commented 2 months ago

The bug is being fixed now

edsiper commented 2 months ago

CTraces fix: https://github.com/fluent/ctraces/pull/53

edsiper commented 2 months ago

Merging fix through https://github.com/fluent/fluent-bit/pull/8768