confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
9 stars 397 forks source link

Add Avro compression codec support #174

Closed westerly closed 6 years ago

westerly commented 7 years ago

Hi all,

It would be nice to be able to configure the hdfs connector to be able to compress data synced into the Avro format. The different compression codec supported by Avro are "deflate" and "snappy".

cotedm commented 7 years ago

@westerly certainly seems possible at first glance even though I'm not that familiar with the Avro compression concepts. Did you want to make an attempt at a pull request? It seems like you would need to be able to specify an additional configuration for the AvroFormat (and AvroRecordWriterProvider) then use DataFileWriter.setCodec. There would need to be some tests written here for config validation and to verify the compressed files are readable.

NishanthShajahan commented 6 years ago

Hi All, How can I enable compression when connectors are configured through confluent control center?

rhauch commented 6 years ago

@NishanthShajahan what version of Confluent Control Center are you using? IIUC, version 4.0 of Control Center and the connector both have fixes that make this possible.

NishanthShajahan commented 6 years ago

I am using 4.0 of control center . I am not seeing anything on the UI that lets me specify the compression codec. unless I am missing something.

On Tue, Jan 2, 2018 at 10:41 AM, Randall Hauch notifications@github.com wrote:

@NishanthShajahan https://github.com/nishanthshajahan what version of Confluent Control Center are you using? IIUC, version 4.0 of Control Center and the connector both have fixes that make this possible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/confluentinc/kafka-connect-hdfs/issues/174#issuecomment-354826553, or mute the thread https://github.com/notifications/unsubscribe-auth/AG7dzACST0rXePYHA7IGbUMg-BFbrrknks5tGmpggaJpZM4MJbbz .

kkonstantine commented 6 years ago

@NishanthShajahan the feature has been merged and will be available in Confluent 4.1 (related PR here: https://github.com/confluentinc/kafka-connect-hdfs/pull/255)

In Confluent Control Center the option to select the compression codec will appear within the group Connector and will be available if you select AvroFormat in format.class. Available options will be: null, deflate, snappy, or bzip2

Given that this has been merged, I'm closing this issue.