Open jessectl opened 3 years ago
This seems to be an issue with the default install, can reproduce on KIND as well:
kind create cluster
helm install fluentd fluent/fluentd
The issue seems to be that without a valid deployment of Elastic to talk to then it fails the probes. If you deploy the default values then it wants an elasticsearch-master
in the current namespace. I can get it to function by just deploying the Helm chart for Elasticsearch as per: https://github.com/calyptia/fluent-bit-devtools/blob/main/deploy-elastic.sh
kind create cluster
git clone https://github.com/calyptia/fluent-bit-devtools.git ./github/calyptia/fluent-bit-devtools/
ES_NAMESPACE=default ./github/calyptia/fluent-bit-devtools/deploy-elastic.sh
helm install fluentd fluent/fluentd
The logs without Elasticsearch deployed show errors trying to connect:
2022-02-04 12:10:18 +0000 [info]: adding match in @FLUENT_LOG pattern="**" type="null"
2022-02-04 12:10:18 +0000 [info]: adding match in @KUBERNETES pattern="kubernetes.var.log.containers.fluentd**" type="relabel"
2022-02-04 12:10:18 +0000 [info]: adding filter in @KUBERNETES pattern="kubernetes.**" type="kubernetes_metadata"
2022-02-04 12:10:18 +0000 [info]: adding match in @KUBERNETES pattern="**" type="relabel"
2022-02-04 12:10:18 +0000 [info]: adding filter in @DISPATCH pattern="**" type="prometheus"
2022-02-04 12:10:18 +0000 [info]: adding match in @DISPATCH pattern="**" type="relabel"
2022-02-04 12:10:18 +0000 [info]: adding match in @OUTPUT pattern="**" type="elasticsearch"
2022-02-04 12:10:20 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. no address for elasticsearch-master (Resolv::ResolvError)
2022-02-04 12:10:20 +0000 [warn]: #0 Remaining retry: 14. Retry to communicate after 2 second(s).
2022-02-04 12:10:24 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. no address for elasticsearch-master (Resolv::ResolvError)
2022-02-04 12:10:24 +0000 [warn]: #0 Remaining retry: 13. Retry to communicate after 4 second(s).
2022-02-04 12:10:32 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. no address for elasticsearch-master (Resolv::ResolvError)
2022-02-04 12:10:32 +0000 [warn]: #0 Remaining retry: 12. Retry to communicate after 8 second(s).
2022-02-04 12:10:47 +0000 [info]: Received graceful stop
2022-02-04 12:10:48 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. no address for elasticsearch-master (Resolv::ResolvError)
2022-02-04 12:10:48 +0000 [warn]: #0 Remaining retry: 11. Retry to communicate after 16 second(s).
With Elasticsearch, it shows more work going on and is responding to probes:
2022-02-04 13:25:01 +0000 [info]: starting fluentd-1.12.4 pid=7 ruby="2.6.7"
2022-02-04 13:25:01 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd", "-c", "/fluentd/etc/../../../etc/fluent/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "-r", "/fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-elasticsearch-5.0.3/lib/fluent/plugin/elasticsearch_simple_sniffer.rb", "--under-supervisor"]
2022-02-04 13:25:02 +0000 [info]: adding match in @FLUENT_LOG pattern="**" type="null"
2022-02-04 13:25:02 +0000 [info]: adding match in @KUBERNETES pattern="kubernetes.var.log.containers.fluentd**" type="relabel"
2022-02-04 13:25:02 +0000 [info]: adding filter in @KUBERNETES pattern="kubernetes.**" type="kubernetes_metadata"
2022-02-04 13:25:02 +0000 [info]: adding match in @KUBERNETES pattern="**" type="relabel"
2022-02-04 13:25:02 +0000 [info]: adding filter in @DISPATCH pattern="**" type="prometheus"
2022-02-04 13:25:02 +0000 [info]: adding match in @DISPATCH pattern="**" type="relabel"
2022-02-04 13:25:02 +0000 [info]: adding match in @OUTPUT pattern="**" type="elasticsearch"
2022-02-04 13:25:02 +0000 [warn]: #0 Detected ES 7.x: `_doc` will be used as the document `_type`.
warning: 299 Elasticsearch-7.16.3-4e6e4eab2297e949ec994e688dad46290d018022 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security."
2022-02-04 13:25:02 +0000 [info]: adding source type="tail"
2022-02-04 13:25:02 +0000 [info]: adding source type="prometheus"
2022-02-04 13:25:02 +0000 [info]: adding source type="prometheus_monitor"
2022-02-04 13:25:02 +0000 [info]: adding source type="prometheus_output_monitor"
2022-02-04 13:25:02 +0000 [info]: #0 starting fluentd worker pid=18 ppid=7 worker=0
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/kube-proxy-xcqpb_kube-system_kube-proxy-b99becf3fc759b549d60c05ee0af95b9f0da208b080bdf2617f2ef85a97b514b.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/kube-apiserver-kind-control-plane_kube-system_kube-apiserver-4bf5c88cc9439552051a58b5392f7a5693a2b4b4abf0670d910aca01934ae7af.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/etcd-kind-control-plane_kube-system_etcd-cec408087c25acfbf2b11f95cd577df56dbb1ef348939a40341113524889082f.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/coredns-558bd4d5db-44ph8_kube-system_coredns-87926b7ea1f0f58635be68d3c86680c5d35dd1289d871a46742b90f4deecc53e.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/coredns-558bd4d5db-g6cl5_kube-system_coredns-44e2f07ffb61a34b05f91d3a1ccd8ccc8526c511cb64c7c8a2b693babac99b75.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/elasticsearch-master-0_default_configure-sysctl-beccfb1c4781a40d7a08afb835995dfcc7eb56a7883d34bf4548802ae06f8cc2.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/kindnet-722m8_kube-system_kindnet-cni-647d41b1cd2890532ed93a7c5a4a2dab3fddadfce0e2d0fd69e0f29982793c9f.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/fluentd-lff9b_default_fluentd-a0a83822759937595f118c9dd8bde77ecc825dbe2a4320a465d8751001fc41c7.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/elasticsearch-master-0_default_elasticsearch-4b897045b535ce0301f1565e39837ea24c12e6586373ed1378476f1a4c77cb35.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/kube-controller-manager-kind-control-plane_kube-system_kube-controller-manager-571ba02726d6857a7f4426ef241d54dfa312df202f07023fa186dd23ae52095e.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/kube-scheduler-kind-control-plane_kube-system_kube-scheduler-ef36c44dfc76041dfa7e992153d37b651003dcc0ddb709b2348553f709463406.log
2022-02-04 13:25:02 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/local-path-provisioner-547f784dff-sjzzp_local-path-storage_local-path-provisioner-0636e525ac4ecab7c53d99c9f049fb66e900b9f4a1ccbf1fb56e0aa141c821e4.log
2022-02-04 13:25:02 +0000 [info]: #0 fluentd worker is now running worker=0
2022-02-04 13:25:33 +0000 [info]: #0 [filter_kube_metadata] stats - namespace_cache_size: 5, pod_cache_size: 12, namespace_cache_miss: 8, pod_cache_watch_updates: 1, pod_cache_api_updates: 11, id_cache_miss: 11, pod_cache_host_updates: 12, namespace_cache_host_updates: 5
warning: 299 Elasticsearch-7.16.3-4e6e4eab2297e949ec994e688dad46290d018022 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security.", 299 Elasticsearch-7.16.3-4e6e4eab2297e949ec994e688dad46290d018022 "[types removal] Specifying types in bulk requests is deprecated."
I can confirm that @patrick-stephens 's code worked for me 👍
It seems to be a more general Fluentd problem in that when it cannot connect to its output it is also not servicing the metric port. It just shows up here because that is then used as the probe.
Hello, I found the same problem, but with IPv6 enabled clusters.
Can anyone reproduce? I'm thinking on creating a MR to use https://docs.fluentd.org/monitoring-fluentd instead of using /metrics
Yes, see above. You need the output available for Elastic otherwise it'll fail the metrics probe.
Edit: Ignore this comment, it's not valid. I was using a custom image with additional plugins that was not based on the fluentd's image but instead bitnami's fluentd image. I just didn't notice it at first. I should be using bitnamis Helm chart. Sorry for the noise.
Since I don't want to deploy ES just to get fluentd to work in my cluster I looked into the monitoring agent solution suggested by @dioguerra.
I managed to get the Helm chart deployment to work using the monitor_agent
in the liveness/readiness probes by using these values.
livenessProbe:
httpGet:
path: /api/plugins.json
port: 24220
readinessProbe:
httpGet:
path: /api/plugins.json
port: 24220
fileConfigs:
monitor_agent.conf: |-
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>
I have no clue if they are as representative as the /metrics endpoint, but at least the pods are considered ready
using this config.
IS this still happening? It should be default fixed...
Or was it dropped? Otherwise a note must be dropped
@dioguerra
IS this still happening? It should be default fixed...
Or was it dropped? Otherwise a note must be dropped
See my edited comment. Sorry for the noise.
I've just tried fluent/fluentd-kubernetes-daemonset:v1.16-debian-opensearch-amd64-2
which seems to still have the problem which causes pod under crashingloop. After switching to /api/plugins.json
endpoint for both liveness and readiness probes, it stabilized itself.
EKS Version: