apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.28k stars 1.19k forks source link

Make it possible to disable compression algorithm feature flags inside the parquet dependency without disabling parquet. #10101

Open adamfaulkner-at opened 7 months ago

adamfaulkner-at commented 7 months ago

Is your feature request related to a problem or challenge?

Hi, I'm trying to reduce the size of my compiled binary. Using cargo bloat, it appears that a large portion of the size (around 10%) is coming from the dependency on the brotli compression algorithm. I don't plan on compressing or decompressing anything using brotli, so I'd like to disable this dependency. Ideally, I would only depend on zstd, since that is the only algorithm I plan to use.

Currently, I have default-features = false and features = ["parquet"]. This is enough to pull in all of the compression algorithms, since the parquet has default features enabled.

I'm less familiar with Rust build systems, so my analysis of the situation might be incorrect.

Describe the solution you'd like

Could we do something similar to #10058, where top level features are exposed by the datafusion crate, and this toggles on or off the features inside the parquet crate?

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 6 months ago

Could we do something similar to https://github.com/apache/datafusion/issues/10058, where top level features are exposed by the datafusion crate, and this toggles on or off the features inside the parquet crate?

I think this is possible

Another thing maybe we could try to do is to create a feature like "parquet-minimum" that only activates a small subset of the parquet compression options 🤔