apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.43k stars 3.69k forks source link

Add support for Kinesis Compression #17062

Open funguy-tech opened 2 weeks ago

funguy-tech commented 2 weeks ago

Description

Placeholder Feature Request for an upcoming PR.

This proposal is to bring support for common compression formats already implemented in Druid’s code base (zstd, gzip, etc) to Kinesis streams.

Compression would be exposed via an optional configuration parameter in the Kinesis ioConfig, ‘compressionFormat’, that when enabled will perform decompression of records at the point of record collection.

Motivation

Unlike Kafka, Kinesis by default does not offer much opportunity for compression out of the box. Because of this, it is a common usage pattern for Kinesis customers to compress/decompress their own data across the wire.

Given that Druid already has internal concepts for compression in various popular formats (zstd, gzip, etc), it would be useful for high throughput customers to have the ability to compress data across the wire.

Our team (a fleet of enterprise Druid clusters at petabyte scale) has seen Kinesis cost reduction to the tune of 50-80% by implementing a custom build of Druid with Kinesis decompression capabilities with little to no discernible impact on ingestion overhead.

PR forthcoming in a few days, but I wanted to open this feature request for community discussion.

abhishekagarwal87 commented 2 weeks ago

Looking forward to the PR. This will be a very useful capability.