fluent / fluentd-kubernetes-daemonset

Fluentd daemonset for Kubernetes and it Docker image
Apache License 2.0
1.27k stars 981 forks source link

TLS not working as expected #1292

Closed sfhl closed 2 years ago

sfhl commented 3 years ago

Hi there,

sorry for bothering. We are using a setup as follows to log messages from one Kubernetes Cluster to a graylog instance on another

fluent-bit -> TLS -> fluentd -> graylog

Sadly this time, there are a lot of errors causing the fluent-bit instances to fail conenct to fluentd, only if ths is activated. The TLS related errors are as follows:

...
2021-08-18 04:10:06 +0000 [error]: #0 unexpected error error_class=Errno::ENOTCONN error="Transport endpoint is not connected - getpeername(2)"
  2021-08-18 04:10:06 +0000 [error]: #0 suppressed same stacktrace
/fluentd/vendor/bundle/ruby/2.6.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65: warning: constant ::Fixnum is deprecated
#<Thread:0x00007f643221eec0@event_loop@/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/thread.rb:70 run> terminated with exception (report_on_exception is true):
/usr/local/lib/ruby/2.6.0/openssl/ssl.rb:239:in `peeraddr': Transport endpoint is not connected - getpeername(2) (Errno::ENOTCONN)
        from /usr/local/lib/ruby/2.6.0/openssl/ssl.rb:239:in `peeraddr'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/server.rb:713:in `rescue in try_tls_accept'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/server.rb:696:in `try_tls_accept'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/server.rb:726:in `on_connect'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/server.rb:41:in `on_connection'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/listener.rb:46:in `on_readable'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/server.rb:697:in `accept_nonblock': Connection reset by peer - SSL_accept (Errno::ECONNRESET)
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/server.rb:697:in `try_tls_accept'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/server.rb:726:in `on_connect'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/server.rb:41:in `on_connection'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/listener.rb:46:in `on_readable'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.13.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
...

I just cant't figure out what is causing the problems, mabye somebody is able to help?

The following config is used:

fluentd (fluentd-kubernetes-daemonset:v1.13.3-debian-graylog-1.1)

<system>
  log_level error
</system>

<source>
  @type forward
  bind 0.0.0.0
  port 24224

  <security>
    self_hostname "#{ENV['FLUENTD_SELF_HOSTNAME']}"
    shared_key "#{ENV['FLUENTD_SHARED_KEY']}"
  </security>

  <transport tls>
    cert_path /fluentd/certs/fluentd.crt
    private_key_path /fluentd/certs/fluentd.key
    private_key_passphrase "#{ENV['FLUENTD_PRIVATE_KEY_PASSPHRASE']}"
  </transport>
</source>

<filter kube.**>
  @type record_modifier
  char_encoding utf-8
</filter>

<filter **>
  @type grep

  <exclude>
    key log
    pattern ^(?:(?:\r)?\n)?$
  </exclude>
</filter>

<match **>
  @type copy
  <store>
    @type gelf
    host "#{ENV['FLUENT_GRAYLOG_HOST']}"
    port "#{ENV['FLUENT_GRAYLOG_PORT']}"
    protocol "#{ENV['FLUENT_GRAYLOG_PROTOCOL'] || 'udp'}"
    <buffer>
      @type file
      path /fluentd/log/elastic-buffer
      flush_thread_count 16
      flush_at_shutdown true
      flush_mode interval
      flush_interval 1s
      flush_thread_interval 1
      flush_thread_burst_interval 1
      retry_forever true
      retry_type exponential_backoff
      retry_max_interval 30
      chunk_limit_size 10M
      queue_limit_length 16
    </buffer>
  </store>
</match>

fluent-bit (fluent-bit:1.8-debug)

[OUTPUT]
  Name            forward
  Match           *
  Host            ${FLUENTD_HOST}
  Port            ${FLUENTD_PORT}
  Time_as_Integer True
  tls             On
  tls.verify      Off
  Shared_Key      ${FLUENTD_SHARED_KEY}
  Retry_Limit     False

Thanks in advance!

kenhys commented 3 years ago

<security> is used for password authentication https://docs.fluentd.org/input/forward#how-to-enable-password-authentication

In contrast to it, you want to use <transport tls>. so I guess that shared key may not be the same. How about using tls.debug in fluent-bit side to get more debugging information?

https://docs.fluentbit.io/manual/pipeline/outputs/forward#secure-forward-mode-configuration-parameters

sfhl commented 3 years ago

Hi there, thanks for the reply. I want /need to use both tls and password authentication and it is working working (without tls enabeld). I just checked once more that they match.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

github-actions[bot] commented 2 years ago

This issue was automatically closed because of stale in 30 days