awslabs / aws-fluent-plugin-kinesis

Amazon Kinesis output plugin for Fluentd
Apache License 2.0
293 stars 96 forks source link

Detect and throw throtlling and other firehose exceptions. #198

Closed rverma-jm closed 4 years ago

rverma-jm commented 4 years ago

Putting json format records to kinesis, it runs for some time and then start producing warning for retry. No idea what's going wrong behind the scene.

2020-04-05 05:01:24 +0000 [info]: #0 fluentd worker is now running worker=0
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:33 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   1, Retry records: 250, Wait seconds 0.26
2020-04-05 05:01:33 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Finish writing chunk
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] Write chunk 5a2840b4f6b1ff93f5691ad484bde6e5 / 500 records /  221 KB
2020-04-05 05:01:34 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   1, Retry records: 438, Wait seconds 0.32
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:34 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   2, Retry records: 329, Wait seconds 0.33
2020-04-05 05:01:34 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:35 +0000 [warn]: #0 no patterns matched tag="fluentd.pod.healthcheck"
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:35 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   3, Retry records: 329, Wait seconds 0.53
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:35 +0000 [warn]: #0 [firehose_ok] Retrying to request batch. Retry count:   4, Retry records: 329, Wait seconds 1.02
2020-04-05 05:01:35 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep start
2020-04-05 05:01:36 +0000 [debug]: #0 [firehose_ok] 70233003238780 sleep finish
2020-04-05 05:01:37 +0000 [debug]: #0 [firehose_ok] Finish writing chunk

Also wondering can we do firehose put directly with aggregated records

simukappu commented 4 years ago

1.

Putting json format records to kinesis, it runs for some time and then start producing warning for retry. No idea what's going wrong behind the scene.

This warning message means that PutRecords API of Kinesis Data Streams or PutRecordBatch API of Kinesis Data Firehose returned failed records because of some kind of error, such as ProvisionedThroughputExceeded.

Gem code: https://github.com/awslabs/aws-fluent-plugin-kinesis/blob/d5d7c693c4720b1e8f5bcea3cdc07371a64ef1ff/lib/fluent/plugin/kinesis_helper/api.rb#L89-L109

Kinesis API reference: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecords.html https://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html

Can you check your cloud side metrics by CloudWatch?

2.

Also wondering can we do firehose put directly with aggregated records

Currently, we cannot put aggregated records directly into firehose. Appreciate your feedback. See also this issue #193.

simukappu commented 4 years ago

Closing this issue for now. Please reopen if required.