Open zizake opened 4 years ago
@zizake unfortunately it looks like we don't support changing the compression yet, but that could be a good contribution if you are interested in opening a PR
@levzem I see the following claim in the documentation. If that's so, does that mean the documentation is not accurate?
parquet.codec The Parquet compression codec to be used for output files. Type: string Default: snappy Valid Values: [none, snappy, gzip, brotli, lz4, lzo, zstd] Importance: low
Hello,
I have the following configuration for sink connector. Is there any possibility to set a custom compression for Parquet files? By default is Snappy, i would like to change it to GZIP due to the better ratio of compression.
In hive the equivalent would command would be: SET parquet.compression=GZIP;
_connector.class=io.confluent.connect.hdfs.HdfsSinkConnector hadoop.conf.dir=/etc/hadoop/conf flush.size=10000 schema.compatibility=BACKWARD tasks.max=1 topics=kafkaplayground timezone=UTC hdfs.url=hdfs://XXXXXXXXXXXXXx:8020 hive.metastore.uris=thrift://XXXXXXXXXXX:9083 locale=en-us key.converter.schemas.enable=false value.converter.schema.registry.url=http://XXXXXXXXXXXXXX:8081 hive.integration=true format.class=io.confluent.connect.hdfs.parquet.ParquetFormat partitioner.class=io.confluent.connect.hdfs.partitioner.HourlyPartitioner value.converter=io.confluent.connect.avro.AvroConverter
Thanks!