confluentinc / kafka-connect-storage-common

Shared software among connectors that target distributed filesystems and cloud storage.
Other
3 stars 154 forks source link

Update parquet version #315

Open bluesheeptoken opened 1 year ago

bluesheeptoken commented 1 year ago

To add zstd-jni and support zstd-jni compression easily in connectors. cf: https://github.com/apache/parquet-mr/pull/793

Problem

Currently, to access the zstd compression. We need to add the Hadoop native library, cf: https://github.com/confluentinc/kafka-connect-storage-cloud/issues/570#issuecomment-1384326683

We could add the zstd-jni package by updating parquet version to ease the use of zstd codec.

Solution

Update parquet version

Tests

No tests have been made. How could I test it? I have seen that in implementations such as Kafka-connect-storage-cloud, this lib is "provided" and all its transitive dependencies. How can I use this PR to build a new Docker image that could be used for local tests?

Is there a way to add unit tests?

I would be happy to help, but kinda lost on this repo

BDeus commented 1 year ago

Nice add to not depend on hadoop library.

However, there was a revert on parquet 1.12.3 here Maybe need to test that the Error not happen again.

FYI there is a new version of parquet 1.13.1