Open gauravaroraoyo opened 5 years ago
Would appreciate some guidance here on how we can go about debugging this further.
Even with the updated fluentd image, I get the same error. Any pointers on resolving this would be appreciated :)
I have the same issue, surprisingly restart of fluend works for a while.
I would appreciate a guidance as well. It's not clear what buffer is over and how to set size (for buffer/chunk/queue limit) properly. In my case, fluentbit forwards to fluentd that forwards to another fluentd
. (the buffere overflow errors I see the most in the last fluentd in the row)
[328] kube.var.log.containers.fluentd-79cc4cffbd-d9cdg_sre_fluentd-dccc4f286753b75a53c464446af44ffcbeba5ad3a21c9a947a11e94f4c6892b2.log: [1560431258.193260514, {"log"=>"2019-06-13 13:07:38 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:269:in `write'" tag="raw.kube.app.obelix" [330] kube.var.log.containers.fluentd-79cc4cffbd-d9cdg_sre_fluentd-dccc4f286753b75a53c464446af44ffcbeba5ad3a21c9a947a11e94f4c6892b2.log: [1560431258.193283014, {"log"=>"2019-06-13 13:07:38 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:269:in `write'" tag="kube.var.log.containers.obelix-j6h2n_ves-system_obelix-74bc7f7ecbcb9981c5f39eab9d85b855c5145f299d71d68ad4bef8f223653327.log"
I also got error 2019-07-02 09:58:09 +0000 [warn]: #0 [out_es] failed to write data into buffer by buffer overflow action=:throw_exception 2019-07-02 09:58:09 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/buffer.rb:298:in `write'" tag="kubernetes.var.log.containers.weave-net-6bltm_kube-system_weave-c86976ea8158588ae5d1f421f2c64de83facefaeb9bbd3a5667eda64b2ae1bd4.log" 2019-07-02 09:58:09 +0000 [warn]: #0 suppressed same stacktrace
The same
2019-07-23 16:51:46 +0000 [warn]: #0 failed to write data into buffer by buffer overflow action=:throw_exception
2019-07-23 16:51:46 +0000 [warn]: #0 send an error event stream to @ERROR: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:269:in `write'" tag="k.worker-7f5b967d75-7gfgd"
BufferOverflowError happens when output speed is slower than incoming traffic. So there are several approaches:
@repeatedly We also see the same errors related to BufferOverflowError. And we got the plugin metrics from the monitor agent which is interesting:
{"plugins":[{"plugin_id":"object:dda9b4","plugin_category":"input","type":"monitor_agent","config":{"@type":"monitor_agent","bind":"0.0.0.0","port":"25220"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:114f888","plugin_category":"input","type":"forward","config":{"@type":"forward","port":"25224"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:e94a6c","plugin_category":"output","type":"null","config":{"@type":"null"},"output_plugin":true,"retry_count":0,"retry":{}},{"plugin_id":"object:e538a0","plugin_category":"output","type":"file","config":{"@type":"file","path":"/xx/xx/xxx/fluentd/${tag[1]}/${tag[0]}/%Y/%m/%d/%H","append":"false","compress":"gzip"},"output_plugin":true,"buffer_queue_length":0,"buffer_total_queued_size":68725542300,"retry_count":58672,"retry":{}}]}
The above shows that the buffer_total_queued_size
is > 64GB and we are using file buffer. But the disk utilization of the entire fluentd buffer directory is much less. Is there something which we are missing or is this a bug in fluentd?
Hoping to get some guidance on our setup... I am using elastic search for the logs... Initially Fluentd pod was throwing the following error: Worker 0 finished unexpectedly with signal SIGKILL Which was resolved after increasing the memory limit to 2Gi... Then we started getting a different fluentd error: [_cluster-elasticsearch_cluster-elasticsearch_elasticsearch] failed to write data into buffer by buffer overflow action=:throw_exception Attempted to resolve the error by tweaking the buffer settings, now we have the following:
_buffer:
timekey: 1m
timekey_wait: 30s
timekey_use_utc: true
chunk_limit_size: 16MB
flush_mode: interval
flush_interval: 5s
flush_thread_count: 8
But I can still see that the buffers size on fluentd is 5.3G (not increasing since last two days) and every so often see the following error: [_cluster-elasticsearch_cluster-elasticsearch_elasticsearch] failed to write data into buffer by buffer overflow action=:throw_exception Buffers size seems to suggest that there are still logs waiting to be pushed to Elastic Search as well as an indication that fluentd is struggling to cope with the logs coming from fluentbit... Please note that I do see some recent logs but not all in elastic search... Appreciate any suggestions...
The same
2019-07-23 16:51:46 +0000 [warn]: #0 failed to write data into buffer by buffer overflow action=:throw_exception 2019-07-23 16:51:46 +0000 [warn]: #0 send an error event stream to @ERROR: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:269:in `write'" tag="k.worker-7f5b967d75-7gfgd"
I did a pressure test for my service, and comes a lot of log, It make my fluentd plugin return error, How do you fix it? restart the fluentd plugin?
Any updates?
Is there any solution for this? If you continuously loose data this can't be used in production.
same issue
...
<buffer>
flush_thread_count 8
flush_interval 1s
chunk_limit_size 10M
queue_limit_length 16
retry_max_interval 30
retry_forever true
</buffer>
...
This solution worked for me.
Just wanted to say that I've struggled all night on this issue, and the only way to resolve is to scale up your receiving end (I assume Elasticsearch?).
I was using 2 elastic data nodes and just scale up by 1 instantaneously solved the issue.
Just for the sake of completeness, that's what I'm using as buffer:
<buffer>
@type file
path /fluentd/log/elastic-buffer
flush_thread_count 16
flush_interval 1s
chunk_limit_size 10M
queue_limit_length 16
flush_mode interval
retry_max_interval 30
retry_forever true
</buffer>
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
This issue was automatically closed because of stale in 30 days
Has anybody resolve this issue ?
This could be the same phenomenon as
any body? somebody?? heeeelp😥
It appears to be a problem with the buffer settings, but given that there are so many reports, there may be something we can improve. It should be investigated.
Good afternoon,
same issue ..
i am collecting logs within Harvester as cluster output for audit and logging data , then logs are sending to jump server where is running fluentd which is forwarding logs to opensearch.
Its still working several hours until fluentd stop it due to:
2024-09-04 06:03:14 +0000 [warn]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/io.rb:186:in `on_readable'
2024-09-04 06:03:14.254450433 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>","location":"/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/buffer.rb:330:in `write'","tag":"kubernetes.var.log.containers.checkmk-cluster-collector-5d756b6fc-qnmdd_checkmk-monitoring_cluster-collector-26c4ec7eda148132d5c1d974fae19ef8d67cadb66918d53e6ac5a0db3a6fb245.log","message":"emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\" location=\"/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/buffer.rb:330:in `write'\" tag=\"kubernetes.var.log.containers.checkmk-cluster-collector-5d756b6fc-qnmdd_checkmk-monitoring_cluster-collector-26c4ec7eda148132d5c1d974fae19ef8d67cadb66918d53e6ac5a0db3a6fb245.log\""}
2024-09-04 06:03:14 +0000 [warn]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run_once'
2024-09-04 06:03:14 +0000 [warn]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run'
2024-09-04 06:03:14 +0000 [warn]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
2024-09-04 06:03:14 +0000 [warn]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-09-04 06:03:14 +0000 [error]: #0 unexpected error on reading data host="10.16.19.98" port=64831 error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data"
2024-09-04 06:03:14.257656445 +0000 fluent.error: {"host":"10.16.19.98","port":64831,"error":"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>","message":"unexpected error on reading data host=\"10.16.19.98\" port=64831 error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\""}
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/buffer.rb:330:in `write'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/output.rb:1095:in `block in handle_stream_simple'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/output.rb:977:in `write_guard'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/output.rb:1094:in `handle_stream_simple'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/output.rb:967:in `execute_chunking'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/output.rb:897:in `emit_buffered'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/event_router.rb:115:in `emit_stream'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/in_forward.rb:318:in `on_message'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/in_forward.rb:226:in `block in handle_connection'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/in_forward.rb:263:in `block (3 levels) in read_messages'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/in_forward.rb:262:in `feed_each'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/in_forward.rb:262:in `block (2 levels) in read_messages'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin/in_forward.rb:271:in `block in read_messages'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin_helper/server.rb:640:in `on_read_without_connection'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/io.rb:123:in `on_readable'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/io.rb:186:in `on_readable'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run_once'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
2024-09-04 06:03:14 +0000 [error]: #0 /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.5/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
i tried several option within buffer on harvester output buffer configuration same as for jump fluentd configuration side, but still buffer errors.
also when forwarding start again due to restarting fluentd service, there are error 400 which is filling fluentd.log : ( but i think this is not problem as in opensearch i can see data, it should maybe about mapping or something else.. )
2024-09-04 09:27:55.389466767 +0000 fluent.warn: {"error":"#<Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError: 400 - Rejected by OpenSearch>","location":null,"tag":"kubernetes.var.log.containers.checkmk-node-collector-container-metrics-wd6wm_checkmk-monitoring_cadvisor-3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60.log","time":1725441524,"record":{"stream":"stderr","logtag":"F","message":"W0904 09:18:44.221599 1 machine_libipmctl.go:64] There are no NVM devices!","kubernetes":{"pod_name":"checkmk-node-collector-container-metrics-wd6wm","namespace_name":"checkmk-monitoring","pod_id":"f681dd37-9fc3-445d-b56a-b9294d3a3dd9","labels":{"app":"checkmk-node-collector-container-metrics","app.kubernetes.io/instance":"checkmk","app.kubernetes.io/name":"checkmk","component":"checkmk-node-collector","controller-revision-hash":"5b48d75884","pod-template-generation":"2"},"annotations":{"cni.projectcalico.org/containerID":"3a74fe8ce9ae0ae712212203d46738eac804258562f82ec04607b405d20e23cd","cni.projectcalico.org/podIP":"10.52.2.192/32","cni.projectcalico.org/podIPs":"10.52.2.192/32","k8s.v1.cni.cncf.io/network-status":"[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"10.52.2.192\"\n ],\n \"default\": true,\n \"dns\": {}\n}]","k8s.v1.cni.cncf.io/networks-status":"[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"10.52.2.192\"\n ],\n \"default\": true,\n \"dns\": {}\n}]"},"host":"sissach-harv3","container_name":"cadvisor","docker_id":"3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60","container_hash":"docker.io/checkmk/cadvisor-patched@sha256:b0fe7daf1ab6beeb28abef175bcce623244be6bf59237fcf72b6af3d62e437f1","container_image":"docker.io/checkmk/cadvisor-patched:1.5.1"}},"message":"dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error=\"400 - Rejected by OpenSearch\" location=nil tag=\"kubernetes.var.log.containers.checkmk-node-collector-container-metrics-wd6wm_checkmk-monitoring_cadvisor-3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60.log\" time=2024-09-04 09:18:44.221675829 +0000 record={\"stream\"=>\"stderr\", \"logtag\"=>\"F\", \"message\"=>\"W0904 09:18:44.221599 1 machine_libipmctl.go:64] There are no NVM devices!\", \"kubernetes\"=>{\"pod_name\"=>\"checkmk-node-collector-container-metrics-wd6wm\", \"namespace_name\"=>\"checkmk-monitoring\", \"pod_id\"=>\"f681dd37-9fc3-445d-b56a-b9294d3a3dd9\", \"labels\"=>{\"app\"=>\"checkmk-node-collector-container-metrics\", \"app.kubernetes.io/instance\"=>\"checkmk\", \"app.kubernetes.io/name\"=>\"checkmk\", \"component\"=>\"checkmk-node-collector\", \"controller-revision-hash\"=>\"5b48d75884\", \"pod-template-generation\"=>\"2\"}, \"annotations\"=>{\"cni.projectcalico.org/containerID\"=>\"3a74fe8ce9ae0ae712212203d46738eac804258562f82ec04607b405d20e23cd\", \"cni.projectcalico.org/podIP\"=>\"10.52.2.192/32\", \"cni.projectcalico.org/podIPs\"=>\"10.52.2.192/32\", \"k8s.v1.cni.cncf.io/network-status\"=>\"[{\\n \\\"name\\\": \\\"k8s-pod-network\\\",\\n \\\"ips\\\": [\\n \\\"10.52.2.192\\\"\\n ],\\n \\\"default\\\": true,\\n \\\"dns\\\": {}\\n}]\", \"k8s.v1.cni.cncf.io/networks-status\"=>\"[{\\n \\\"name\\\": \\\"k8s-pod-network\\\",\\n \\\"ips\\\": [\\n \\\"10.52.2.192\\\"\\n ],\\n \\\"default\\\": true,\\n \\\"dns\\\": {}\\n}]\"}, \"host\"=>\"sissach-harv3\", \"container_name\"=>\"cadvisor\", \"docker_id\"=>\"3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60\", \"container_hash\"=>\"docker.io/checkmk/cadvisor-patched@sha256:b0fe7daf1ab6beeb28abef175bcce623244be6bf59237fcf72b6af3d62e437f1\", \"container_image\"=>\"docker.io/checkmk/cadvisor-patched:1.5.1\"}}"}
2024-09-04 09:27:55.389995243 +0000 fluent.warn: {"error":"#<Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError: 400 - Rejected by OpenSearch>","location":null,"tag":"kubernetes.var.log.containers.checkmk-node-collector-container-metrics-wd6wm_checkmk-monitoring_cadvisor-3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60.log","time":1725441525,"record":{"stream":"stderr","logtag":"F","message":"W0904 09:18:45.198464 1 info.go:53] Couldn't collect info from any of the files in \"/etc/machine-id,/var/lib/dbus/machine-id\"","kubernetes":{"pod_name":"checkmk-node-collector-container-metrics-wd6wm","namespace_name":"checkmk-monitoring","pod_id":"f681dd37-9fc3-445d-b56a-b9294d3a3dd9","labels":{"app":"checkmk-node-collector-container-metrics","app.kubernetes.io/instance":"checkmk","app.kubernetes.io/name":"checkmk","component":"checkmk-node-collector","controller-revision-hash":"5b48d75884","pod-template-generation":"2"},"annotations":{"cni.projectcalico.org/containerID":"3a74fe8ce9ae0ae712212203d46738eac804258562f82ec04607b405d20e23cd","cni.projectcalico.org/podIP":"10.52.2.192/32","cni.projectcalico.org/podIPs":"10.52.2.192/32","k8s.v1.cni.cncf.io/network-status":"[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"10.52.2.192\"\n ],\n \"default\": true,\n \"dns\": {}\n}]","k8s.v1.cni.cncf.io/networks-status":"[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"10.52.2.192\"\n ],\n \"default\": true,\n \"dns\": {}\n}]"},"host":"sissach-harv3","container_name":"cadvisor","docker_id":"3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60","container_hash":"docker.io/checkmk/cadvisor-patched@sha256:b0fe7daf1ab6beeb28abef175bcce623244be6bf59237fcf72b6af3d62e437f1","container_image":"docker.io/checkmk/cadvisor-patched:1.5.1"}},"message":"dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error=\"400 - Rejected by OpenSearch\" location=nil tag=\"kubernetes.var.log.containers.checkmk-node-collector-container-metrics-wd6wm_checkmk-monitoring_cadvisor-3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60.log\" time=2024-09-04 09:18:45.198569000 +0000 record={\"stream\"=>\"stderr\", \"logtag\"=>\"F\", \"message\"=>\"W0904 09:18:45.198464 1 info.go:53] Couldn't collect info from any of the files in \\\"/etc/machine-id,/var/lib/dbus/machine-id\\\"\", \"kubernetes\"=>{\"pod_name\"=>\"checkmk-node-collector-container-metrics-wd6wm\", \"namespace_name\"=>\"checkmk-monitoring\", \"pod_id\"=>\"f681dd37-9fc3-445d-b56a-b9294d3a3dd9\", \"labels\"=>{\"app\"=>\"checkmk-node-collector-container-metrics\", \"app.kubernetes.io/instance\"=>\"checkmk\", \"app.kubernetes.io/name\"=>\"checkmk\", \"component\"=>\"checkmk-node-collector\", \"controller-revision-hash\"=>\"5b48d75884\", \"pod-template-generation\"=>\"2\"}, \"annotations\"=>{\"cni.projectcalico.org/containerID\"=>\"3a74fe8ce9ae0ae712212203d46738eac804258562f82ec04607b405d20e23cd\", \"cni.projectcalico.org/podIP\"=>\"10.52.2.192/32\", \"cni.projectcalico.org/podIPs\"=>\"10.52.2.192/32\", \"k8s.v1.cni.cncf.io/network-status\"=>\"[{\\n \\\"name\\\": \\\"k8s-pod-network\\\",\\n \\\"ips\\\": [\\n \\\"10.52.2.192\\\"\\n ],\\n \\\"default\\\": true,\\n \\\"dns\\\": {}\\n}]\", \"k8s.v1.cni.cncf.io/networks-status\"=>\"[{\\n \\\"name\\\": \\\"k8s-pod-network\\\",\\n \\\"ips\\\": [\\n \\\"10.52.2.192\\\"\\n ],\\n \\\"default\\\": true,\\n \\\"dns\\\": {}\\n}]\"}, \"host\"=>\"sissach-harv3\", \"container_name\"=>\"cadvisor\", \"docker_id\"=>\"3597333cf11b7dd09210b34a3d97b7dac294ae2a017f4cc130b1a86982cb2f60\", \"container_hash\"=>\"docker.io/checkmk/cadvisor-patched@sha256:b0fe7daf1ab6beeb28abef175bcce623244be6bf59237fcf72b6af3d62e437f1\", \"container_image\"=>\"docker.io/checkmk/cadvisor-patched:1.5.1\"}}"}
Thanks for any advice.
update:
as a workaround restart fluentd on jump server each 12hours helps so far.
This issue is still open, is there no official solution / fix yet? Like cforce said, getting an error like this incredibly concerning in production...
The fluentd server itself runs on a dedicated system outside of the kubernetes cluster. We do see a few warnings on it from time to time
We've tried setting every setting for
chunk_limit
and flush settings to get rid of this error but it doesn't seem to go away. Is there an obvious error in our configuration that we're missing?