Closed dohoangkhiem closed 2 years ago
I've experienced a similar problem (slow-flush-threshold and buffer overflows) with td-agent3 when forwarding to a Splunk instance outside a kubernetes cluster. My issue is that the forwarder can't keep up with the volume of messages being aggregated within the cluster. I'm wondering if there are any recommendations or best practices for handling large volumes of messages through one fluentd aggregator?
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
The issue might be stale but it's still problematic.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Still an issue.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Still an issue.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Still an issue.
@vguaglione Does it still reproducible with the recent version of Fluentd?
@kenhys That I cannot tell you because we are pinned to the version of fluentd integrated with OpenShift 3.11. From what I understand, individual logging components cannot be upgraded, and only get upgraded when the version of OpenShift is upgraded. Under OpenShift 4, the logging system has been redesigned so in order to find the answer to your question, we'd need to upgrade to a higher minor version of OpenShift, which in our case is not a possibility. We will move directly to version 4 at some point.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Still an issue.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
This issue was automatically closed because of stale in 30 days
Still an issue
We're running td-agent 3 on Ubuntu 14.04, EC2 instance m4.2xlarge
Configurations Logs from server nodes are forwarded to a log-aggregator before sending to ES.
Here's the piece of conf on server node
the logs directory has quite high volume of logs, started out like this, and could grow drastically, logs generated continuously.
td-agent started working properly, following tails of *.log files, and sending logs in almost realtime, just after a while (several hours), it started sending logs late by half an hour and more, then almost not sending anymore, or too slow, we see lots of warning about slow buffer flush
which made it almost not working as expected.
This seems happen only on this server, on some other servers (which might have lower volume of logs) it seems working normally.
Is this a problem of performance, the server is quite strong and we've tried tune td-agent flush with
flush_thread_count
but the issue still happens, like just a matter of time?Update: We also saw lots of
chunk bytes limit exceeds for an emitted event stream
warnings on the log-aggregator server, not sure if it's related