fail to parse watch call response and crashes pod.

rockb1017 commented 4 years ago

2020-08-28 15:36:27 +0000 [error]: Exception encountered parsing pod watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.error reading from socket: Could not parse data
#<Thread:0x0000562808f18f70@/usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:276 run> terminated with exception (report_on_exception is true):
/usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:71:in `rescue in set_up_pod_thread': Exception encountered parsing pod watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting. (Fluent::UnrecoverableError)
    from /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:40:in `set_up_pod_thread'
    from /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:276:in `block in configure'
Unexpected error Exception encountered parsing pod watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.
  /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:71:in `rescue in set_up_pod_thread'
  /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:40:in `set_up_pod_thread'
  /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:276:in `block in configure'

it is deployed on oracle cloud kubernetes. this plugin version 2.5.2

rockb1017 commented 4 years ago

i looked into this further and I think it happens when connection is terminated by an external component. (ie load balaner, https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html) right now we don't have rescue clause for ConnectionError raised by http library during parsing.

jcantrill commented 4 years ago

Pull requests are welcome

djj0809 commented 3 years ago

I'm also seeing this in Azure Kubernetes clusters:

Unexpected error Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:70:in `rescue in set_up_namespace_thread'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:39:in `set_up_namespace_thread'
  /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'