Open tetramin opened 5 years ago
We have the same issue on k8s clusters running on bare-metal instances: 7 of 54 containers with identical config do not listening on port without any log messages (debug
level used).
I did some experiments by editing fluentd configuration file directly in running containers and applying changes by sending HUP
signal to fluentd process to re-load config.
Two observations:
in_tail
input (targeting k8s container) logs resolves issue with binding after config reload.in_tail
and elasticsearch output replaced with
<match kube.**>
@type null
</match>
plugin was able to bind after a short time. After this I reverted output back to elasticsearch plugin, reloaded config - and plugin still able to bind.
Looks like big input queue/pressure prevents plugin from bind on startup time.
I've find issue and solution for my case. We have two inputs configured for our fluentd
instances.
One of them is in_tail
for k8s logs:
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
tag kubernetes.*
format none
refresh_interval 60
</source>
From startup logs in debug
mode it seen that in_tail
found around 80 tailing paths separated by comma:
2019-05-25 16:24:34 +0000 [debug]: #0 tailing paths: target = /var/log/containers/XXX.log,/var/log/containers/YYY.log,...,....
Next message is:
2019-05-25 16:24:34 +0000 [info]: #0 following tail of /var/log/containers/XXX.log
And no more following tail of ...
messages, so obviously it stuck on file XXX.log
which is around 7Gb in my case.
I've added skip_refresh_on_startup true
setting for in_tail
plugin, and now monitoring http source able to bind on startup.
Yes. in_tail launches and starts file watchers during startup phase by default.
This is no problem for almost cases but if you have large files, it takes long time to process it.
skip_refresh_on_startup
avoids this problem by disabling watcher launch at startup.
I'm starting fluentd with prometheus plugin as a daemonset in GKE. Often, when the container is started, port 24231 does not open, although on other nodes the ports are normally exposed. When I enter the
ss -lnt
command, I do not see open ports in container.There are no suspicious messages in the logs even in trace mode.
How I can debug this?