awslabs / aws-fluent-plugin-kinesis

Amazon Kinesis output plugin for Fluentd
Apache License 2.0
293 stars 96 forks source link

Problems outputting to kinesis firehose #196

Closed mostfunkyduck closed 3 years ago

mostfunkyduck commented 4 years ago

I have a few problems with the firehose plugin's compression that I was hoping y'all could shed some light on. My use case is that I'm tailing a file of json data, batching events, then pushing them to s3 via firehose.

  1. With the default settings, each compressed stream is newline delimited in the resulting s3 bucket. The problem is that when I compress strings such as {"a":1}, the resulting compressed data contains a newline, meaning that there's no way to differentiate records based on a newline alone. If I turn off 'append_new_line', then you end up with a bunch of compressed streams with NO delimiter, which isn't a valid way to concatenate zlib archives, so that fails as well.

  2. The compression algorithm ends up compressing each line of the log file individually. This results in much less efficient compression, especially for small records. It should compress batches of records instead.

Please let me know if there's anything more you need or if you want me to break this into two tickets.

simukappu commented 4 years ago

Thank you for your feedback. Let me confirm my understanding.

When we turn on 'append_new_line',

Is this correct?

mostfunkyduck commented 4 years ago

I'm seeing a '¥n' where I'd expect to see '\n' in your response, otherwise this is most of the issue. The other part is that there needs to be some kind of delimiter between the zlib strings, zlib doesn't automatically concatenate. Maybe switch to gzip, which does, and be done with it?

simukappu commented 4 years ago

Does anyone have the same problem?

simukappu commented 3 years ago

Closing this issue for now. Please reopen if required.