Problems outputting to kinesis firehose

mostfunkyduck commented 4 years ago

I have a few problems with the firehose plugin's compression that I was hoping y'all could shed some light on. My use case is that I'm tailing a file of json data, batching events, then pushing them to s3 via firehose.

With the default settings, each compressed stream is newline delimited in the resulting s3 bucket. The problem is that when I compress strings such as {"a":1}, the resulting compressed data contains a newline, meaning that there's no way to differentiate records based on a newline alone. If I turn off 'append_new_line', then you end up with a bunch of compressed streams with NO delimiter, which isn't a valid way to concatenate zlib archives, so that fails as well.
The compression algorithm ends up compressing each line of the log file individually. This results in much less efficient compression, especially for small records. It should compress batches of records instead.

Please let me know if there's anything more you need or if you want me to break this into two tickets.

simukappu commented 4 years ago

Thank you for your feedback. Let me confirm my understanding.

When we turn on 'append_new_line',

Current Behavior: The gem compress each record and append '\n' to compressed record
Expected Behavior: The gem firstly append '\n' to original record and compress it

Is this correct?

mostfunkyduck commented 4 years ago

I'm seeing a '¥n' where I'd expect to see '\n' in your response, otherwise this is most of the issue. The other part is that there needs to be some kind of delimiter between the zlib strings, zlib doesn't automatically concatenate. Maybe switch to gzip, which does, and be done with it?

simukappu commented 4 years ago

Does anyone have the same problem?

simukappu commented 3 years ago

Closing this issue for now. Please reopen if required.

awslabs / aws-fluent-plugin-kinesis

Problems outputting to kinesis firehose #196