Closed julienlefur closed 4 years ago
@jcantrill I still have the same behaviour with the version 2.5.2 The connection to the api-server is closed regularly. The stat counter "namespace_watch_failures" is incremented until 10 and fluentd crashes. I have a cronjob that runs to apply a change in a namespace in order to make fluentd reset this counter. The is a workaround to prevent fluentd to crash. I'll keep you posted if I can dig deeper and find a few things.
@jcantrill we upgraded to 2.5.2 and we still get the same issue. fluentd is crashing every hour
2020-08-01 15:53:44 +0300 [error]: Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.error reading from socket: Could not parse data
#<Thread:0x000055ea2b3c0d78@/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279 run> terminated with exception (report_on_exception is true):
/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:70:in `rescue in set_up_namespace_thread': Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting. (Fluent::UnrecoverableError)
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:39:in `set_up_namespace_thread'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
/opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/parser.rb:31:in `add': error reading from socket: Could not parse data (HTTP::ConnectionError)
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:214:in `read_more'
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:92:in `readpartial'
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:30:in `readpartial'
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:36:in `each'
from /opt/bitnami/fluentd/gems/kubeclient-4.8.0/lib/kubeclient/watch_stream.rb:25:in `each'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:114:in `process_namespace_watcher_notices'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:41:in `set_up_namespace_thread'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
/opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/parser.rb:31:in `add': Could not parse data (IOError)
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:214:in `read_more'
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:92:in `readpartial'
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:30:in `readpartial'
from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:36:in `each'
from /opt/bitnami/fluentd/gems/kubeclient-4.8.0/lib/kubeclient/watch_stream.rb:25:in `each'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:114:in `process_namespace_watcher_notices'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:41:in `set_up_namespace_thread'
from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
Unexpected error Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.
/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:70:in `rescue in set_up_namespace_thread'
/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:39:in `set_up_namespace_thread'
/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
I enabled trace log for the plugin trying to figure out the issue but I wasn't lucky and I don't want to increase the retry because it will eventually crash I want to diagnose the root cause and solve it. Can you advise.
The watch connection on the API Server seems to be closed regularly by one of the following : Kubeclient / http / apiserver
When no modifications are made on the namespaces, the namespace_watch_retry_count is constantly increasing due to this error:
2020-06-26 11:59:04 +0000 [info]: #0 [filter_kube_metadata] Exception encountered parsing namespace watch event. The connection might have been closed. Sleeping for 128 seconds and resetting the namespace watcher.error reading from socket: Could not parse data
When the max is reached, Fluentd crashes and restarts.
https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/940296bac92ac05b97654e156dcf16c1eacd21b5/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb#L43-L55
The only way to reset the namespace_watch_retry_count is to make a change on a namespace so the function
reset_namespace_watch_retry_stats
is called. But when no modification are done on the namespace, fluentd crashes after 10 'connection closed' errors.Would it be possible to catch the 'normal' connection close errors to avoid this behaviour?
It seems to be linked to this issue on Kubeclient: https://github.com/abonas/kubeclient/issues/273