td-agent sends logs with hours delayed and keeps warning about slow flush

dohoangkhiem commented 6 years ago

We're running td-agent 3 on Ubuntu 14.04, EC2 instance m4.2xlarge

$ td-agent --version
td-agent 1.2.2

$ dpkg -l | grep td-agent
ii  td-agent                                 3.2.0-0                                    amd64        Treasure Agent: A data collector for Treasure Data

Configurations Logs from server nodes are forwarded to a log-aggregator before sending to ES.

Here's the piece of conf on server node

<source>
  @type tail
  path /home/ubuntu/local/logs/mta/*.log
  pos_file /var/log/td-agent/mta.pos
  <parse>
    @type multiline
    format_firstline /\d{4}-\d{2}-\d{2}/
    format1 /^(?<datetime>\d{4}-\d{2}-\d{2} \d{2}\:\d{2}\:\d{2},\d+?):(?<process>\d+):(?<thread>\d+):(?<user>.+):(?<levelname>.+):(?<filename>.+):(?<function>.+\(\)):(?<lineno>\d+):(?<message>.*)$/
  </parse>
  tag "mta.#{Socket.gethostname}"
</source>

<match mta.**>
  @type forward
  <server>
    host 10.10.1.20
    port 24224
  </server>

  <buffer>
    @type file
    path /var/log/td-agent/buffer/mta.buffer
    flush_interval 10s
    retry_max_times 10
    queue_limit_length 256
    flush_thread_count 8
  </buffer>

</match>

the logs directory has quite high volume of logs, started out like this, and could grow drastically, logs generated continuously.

$ ll -h
total 2.9G
drwxr-xr-x  2 ubuntu ubuntu  12K Jul 26 06:20 ./
drwxrwxr-x 20 ubuntu ubuntu 4.0K Jul 26 07:09 ../
-rw-r--r--  1 ubuntu ubuntu 2.4K Jul 24 13:40 bootstrap_prepare.log
-rw-r--r--  1 ubuntu ubuntu  11M Jul 26 06:47 lmdb_server.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 26 03:38 lmdb_server.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 12:30 lmdb_server.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 22:22 lmdb_server.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 18:41 lmdb_server.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 15:22 lmdb_server.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 11:01 lmdb_server.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 04:57 lmdb_server.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 22:46 lmdb_server.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 18:44 lmdb_server.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:38 lmdb_server.log.9
-rw-r--r--  1 ubuntu ubuntu  85K Jul 26 06:41 mail_file_purger.log
-rw-r--r--  1 ubuntu ubuntu 7.1K Jul 26 06:43 mta_connector.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 26 06:20 mta_connector.log.1
-rw-r--r--  1 ubuntu ubuntu 122K Jul 23 17:22 mta_connector.log.10
-rw-r--r--  1 ubuntu ubuntu 294K Jul 26 00:07 mta_connector.log.2
-rw-r--r--  1 ubuntu ubuntu 7.2K Jul 25 21:16 mta_connector.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 16:36 mta_connector.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 18:33 mta_connector.log.5
-rw-r--r--  1 ubuntu ubuntu 5.9K Jul 24 13:01 mta_connector.log.6
-rw-r--r--  1 ubuntu ubuntu  15M Jul 24 13:32 mta_connector.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 02:36 mta_connector.log.8
-rw-r--r--  1 ubuntu ubuntu 7.4K Jul 23 17:09 mta_connector.log.9
-rw-r--r--  1 ubuntu ubuntu  15M Jul 26 06:47 process_messages.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 05:44 process_messages.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 21 16:12 process_messages.log.2
-rw-r--r--  1 ubuntu ubuntu  11M Jul 26 06:47 process_messages_workers_00.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 23:40 process_messages_workers_00.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 09:43 process_messages_workers_00.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 19:35 process_messages_workers_00.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 16:41 process_messages_workers_00.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 14:28 process_messages_workers_00.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 10:03 process_messages_workers_00.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 20:50 process_messages_workers_00.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 18:04 process_messages_workers_00.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 16:02 process_messages_workers_00.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 13:47 process_messages_workers_00.log.9
-rw-r--r--  1 ubuntu ubuntu 2.8M Jul 26 06:44 process_messages_workers_01.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 26 01:21 process_messages_workers_01.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 22:02 process_messages_workers_01.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 19:39 process_messages_workers_01.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 17:00 process_messages_workers_01.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 14:40 process_messages_workers_01.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 20:45 process_messages_workers_01.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 17:24 process_messages_workers_01.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:15 process_messages_workers_01.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 12:54 process_messages_workers_01.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 07:22 process_messages_workers_01.log.9
-rw-r--r--  1 ubuntu ubuntu  12M Jul 26 05:08 process_messages_workers_02.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 21:04 process_messages_workers_02.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 18:05 process_messages_workers_02.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 18:11 process_messages_workers_02.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 15:32 process_messages_workers_02.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 21:27 process_messages_workers_02.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 18:18 process_messages_workers_02.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:55 process_messages_workers_02.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 14:10 process_messages_workers_02.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 08:15 process_messages_workers_02.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 21:12 process_messages_workers_02.log.9
-rw-r--r--  1 ubuntu ubuntu  15M Jul 26 04:05 process_messages_workers_03.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 20:32 process_messages_workers_03.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 16:11 process_messages_workers_03.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 17:51 process_messages_workers_03.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 15:12 process_messages_workers_03.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 20:20 process_messages_workers_03.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 17:26 process_messages_workers_03.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:22 process_messages_workers_03.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 12:35 process_messages_workers_03.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 23:48 process_messages_workers_03.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 19:41 process_messages_workers_03.log.9
-rw-r--r--  1 ubuntu ubuntu  13M Jul 26 02:24 process_messages_workers_04.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 20:59 process_messages_workers_04.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 15:24 process_messages_workers_04.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 18:00 process_messages_workers_04.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 15:36 process_messages_workers_04.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 20:25 process_messages_workers_04.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 17:26 process_messages_workers_04.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:16 process_messages_workers_04.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 12:15 process_messages_workers_04.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 22:11 process_messages_workers_04.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 18:46 process_messages_workers_04.log.9
-rw-r--r--  1 ubuntu ubuntu 4.0M Jul 26 02:04 process_messages_workers_05.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 22:48 process_messages_workers_05.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 16:24 process_messages_workers_05.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 19:23 process_messages_workers_05.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 16:39 process_messages_workers_05.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 14:08 process_messages_workers_05.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 18:02 process_messages_workers_05.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:51 process_messages_workers_05.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 13:48 process_messages_workers_05.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 01:28 process_messages_workers_05.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 20:11 process_messages_workers_05.log.9
-rw-r--r--  1 ubuntu ubuntu 4.4M Jul 26 02:04 process_messages_workers_06.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 22:32 process_messages_workers_06.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 16:33 process_messages_workers_06.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 19:00 process_messages_workers_06.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 16:08 process_messages_workers_06.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 20:39 process_messages_workers_06.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 17:54 process_messages_workers_06.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 15:29 process_messages_workers_06.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 13:01 process_messages_workers_06.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 01:15 process_messages_workers_06.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 20:18 process_messages_workers_06.log.9
-rw-r--r--  1 ubuntu ubuntu  20M Jul 26 02:04 process_messages_workers_07.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 20:00 process_messages_workers_07.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 15:32 process_messages_workers_07.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 17:09 process_messages_workers_07.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 14:44 process_messages_workers_07.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 19:19 process_messages_workers_07.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 16:43 process_messages_workers_07.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 14:50 process_messages_workers_07.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 11:45 process_messages_workers_07.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 21:37 process_messages_workers_07.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 18:56 process_messages_workers_07.log.9
-rw-r--r--  1 ubuntu ubuntu 2.3M Jul 26 06:46 realtime_attachment_change.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 26 03:04 realtime_attachment_change.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 18:06 realtime_attachment_change.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 09:33 realtime_attachment_change.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 22 00:47 realtime_attachment_change.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 20 16:03 realtime_attachment_change.log.5
-rw-r--r--  1 ubuntu ubuntu 8.3M Jul 26 06:47 realtime_encrypt_outgoing.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 17:54 realtime_encrypt_outgoing.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 10:56 realtime_encrypt_outgoing.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 04:07 realtime_encrypt_outgoing.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 21 21:14 realtime_encrypt_outgoing.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 20 14:12 realtime_encrypt_outgoing.log.5
-rw-r--r--  1 ubuntu ubuntu 3.5M Jul 26 06:47 realtime_quarantine.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 26 01:10 realtime_quarantine.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 16:42 realtime_quarantine.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 08:42 realtime_quarantine.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 22 00:10 realtime_quarantine.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 20 15:43 realtime_quarantine.log.5
-rw-r--r--  1 ubuntu ubuntu 9.9M Jul 26 06:47 realtime_release.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 22:07 realtime_release.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 22 12:25 realtime_release.log.10
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 15:55 realtime_release.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 21:55 realtime_release.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 16:17 realtime_release.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 12:38 realtime_release.log.5
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 01:13 realtime_release.log.6
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 18:15 realtime_release.log.7
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 14:05 realtime_release.log.8
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 01:06 realtime_release.log.9
-rw-r--r--  1 ubuntu ubuntu 8.3M Jul 26 06:47 realtime_release_outgoing.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 17:55 realtime_release_outgoing.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 10:57 realtime_release_outgoing.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 04:07 realtime_release_outgoing.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 21 21:15 realtime_release_outgoing.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 20 14:12 realtime_release_outgoing.log.5
-rw-r--r--  1 ubuntu ubuntu 6.1M Jul 26 06:47 realtime_subject_change.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 21:12 realtime_subject_change.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 13:31 realtime_subject_change.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 23 06:02 realtime_subject_change.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 21 22:28 realtime_subject_change.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 20 14:54 realtime_subject_change.log.5
-rw-r--r--  1 ubuntu ubuntu  17M Jul 26 06:47 realtime_subject_change_with_body.log
-rw-r--r--  1 ubuntu ubuntu  20M Jul 25 06:17 realtime_subject_change_with_body.log.1
-rw-r--r--  1 ubuntu ubuntu  20M Jul 24 01:38 realtime_subject_change_with_body.log.2
-rw-r--r--  1 ubuntu ubuntu  20M Jul 22 21:22 realtime_subject_change_with_body.log.3
-rw-r--r--  1 ubuntu ubuntu  20M Jul 21 16:33 realtime_subject_change_with_body.log.4
-rw-r--r--  1 ubuntu ubuntu  20M Jul 20 11:42 realtime_subject_change_with_body.log.5
-rw-r--r--  1 ubuntu ubuntu 1.5M Jul 26 06:46 sqlite_server.log
-rw-r--r--  1 ubuntu ubuntu 785K Jul 25 23:08 transformers_body_transformer.log
-rw-r--r--  1 ubuntu ubuntu    0 Jul 19 11:54 transformers_subject_transformer.log

td-agent started working properly, following tails of *.log files, and sending logs in almost realtime, just after a while (several hours), it started sending logs late by half an hour and more, then almost not sending anymore, or too slow, we see lots of warning about slow buffer flush

2018-07-25 11:43:45 +0000 [info]: #0 flushing all buffer forcedly
2018-07-25 14:20:01 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=9376.285166719987 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e14d938"

...

2018-07-26 01:56:56 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=25.79419337300351 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e14d938"
2018-07-26 02:20:35 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=1418.593912440003 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e170a50"
2018-07-26 02:36:31 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=956.9829386789934 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e14d938"
2018-07-26 02:36:57 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=982.7485669820453 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e170a50"

...

2018-07-26 05:36:54 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=1023.6309650209732 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e14d938"
2018-07-26 05:51:10 +0000 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=855.3040326869814 slow_flush_log_threshold=20.0 plugin_id="object:3fd43e170a50"

which made it almost not working as expected.

This seems happen only on this server, on some other servers (which might have lower volume of logs) it seems working normally.

Is this a problem of performance, the server is quite strong and we've tried tune td-agent flush with flush_thread_count but the issue still happens, like just a matter of time?

Update: We also saw lots of chunk bytes limit exceeds for an emitted event stream warnings on the log-aggregator server, not sure if it's related

2018-07-26 13:15:04 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 101120808bytes
2018-07-26 13:15:25 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 92743877bytes
2018-07-26 13:27:15 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 24427037bytes
2018-07-26 13:33:35 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 17420972bytes
2018-07-26 13:33:47 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 100753381bytes
2018-07-26 13:34:11 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 8608639bytes
2018-07-26 13:36:02 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 19952196bytes
2018-07-26 13:36:23 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 53628250bytes
2018-07-26 13:40:07 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 31507879bytes
2018-07-26 13:59:58 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 21383659bytes
2018-07-26 13:59:59 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 64420617bytes
2018-07-26 14:00:37 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 27081322bytes
2018-07-26 14:19:30 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 33967557bytes
2018-07-26 14:21:35 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 19904675bytes
2018-07-26 14:22:01 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 61010980bytes
2018-07-26 14:32:24 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 62279112bytes
2018-07-26 14:43:20 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 19904256bytes
2018-07-26 14:44:52 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 32346010bytes
2018-07-26 14:50:51 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 8548460bytes
2018-07-26 14:51:18 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 30544118bytes
2018-07-26 15:00:47 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 145738686bytes
2018-07-26 15:05:15 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 20962156bytes
2018-07-26 15:09:29 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 8548460bytes
2018-07-26 15:11:01 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 59620700bytes
2018-07-26 15:21:16 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 29961627bytes
2018-07-26 15:21:27 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 25887211bytes
2018-07-26 15:21:45 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 11731460bytes
2018-07-26 15:22:17 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 14047727bytes
2018-07-26 15:23:47 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 15182250bytes
2018-07-26 15:43:39 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 22709217bytes
2018-07-26 15:43:40 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 32282103bytes
2018-07-26 15:44:27 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 16517434bytes
2018-07-26 16:11:49 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 16283152bytes
2018-07-26 16:12:16 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 8779811bytes
2018-07-26 16:12:27 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 13907269bytes
2018-07-26 16:47:05 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 14492001bytes
2018-07-27 00:20:25 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 45712580bytes
2018-07-27 00:20:39 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 44567926bytes
2018-07-27 00:21:41 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 11134918bytes
2018-07-27 00:25:38 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 29632880bytes
2018-07-27 01:41:27 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 12974860bytes
2018-07-27 02:18:07 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 20644272bytes
2018-07-27 03:24:41 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 17136656bytes
2018-07-27 03:24:52 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 57912600bytes
2018-07-27 03:25:42 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 12625113bytes
2018-07-27 03:26:02 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 14103731bytes
2018-07-27 03:29:50 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 9940252bytes
2018-07-27 03:30:43 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 9897256bytes
2018-07-27 03:30:49 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 17060529bytes
2018-07-27 03:41:15 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 16731706bytes

vguaglione commented 6 years ago

I've experienced a similar problem (slow-flush-threshold and buffer overflows) with td-agent3 when forwarding to a Splunk instance outside a kubernetes cluster. My issue is that the forwarder can't keep up with the volume of messages being aggregated within the cluster. I'm wondering if there are any recommendations or best practices for handling large volumes of messages through one fluentd aggregator?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

vguaglione commented 3 years ago

The issue might be stale but it's still problematic.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

vguaglione commented 3 years ago

Still an issue.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

vguaglione commented 3 years ago

Still an issue.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

vguaglione commented 2 years ago

Still an issue.

kenhys commented 2 years ago

@vguaglione Does it still reproducible with the recent version of Fluentd?

vguaglione commented 2 years ago

@kenhys That I cannot tell you because we are pinned to the version of fluentd integrated with OpenShift 3.11. From what I understand, individual logging components cannot be upgraded, and only get upgraded when the version of OpenShift is upgraded. Under OpenShift 4, the logging system has been redesigned so in order to find the answer to your question, we'd need to upgrade to a higher minor version of OpenShift, which in our case is not a possibility. We will move directly to version 4 at some point.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

vguaglione commented 2 years ago

Still an issue.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

github-actions[bot] commented 2 years ago

This issue was automatically closed because of stale in 30 days

vguaglione commented 2 years ago

Still an issue

fluent / fluentd

td-agent sends logs with hours delayed and keeps warning about slow flush #2088