apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

Providing parquet-avro configuration to parquet-cli #2967

Open andrewthad opened 4 months ago

andrewthad commented 4 months ago

Describe the usage question you have. Please include as many useful details as possible.

The parquet-avro library supports using some more recent parquet features with configuration settings like parquet.avro.write-parquet-uuid (needs to be true) and parquet.avro.write-old-list-structure (needs to be false). However, parquet-cli does not have a documented way to set these. Poking around with strace, I've found a convert command does result in an openat syscall targetting logging.properties to look for logging-related configuration, but I've not found any other attempts to open any other files with the properties extensions. To work around this, I have been manually changing the values of WRITE_PARQUET_UUID_DEFAULT and friends in the source and rebuilding the project.

This issue can be resolved in one of two ways:

  1. Document in the readme file for parquet-cli that it does not support configuring any AvroWriteSupport settings
  2. Document in the readme file for parquet-cli that it does support configuring AvroWriteSupport settings, and provide an example of what CLI option is used to do this.

Component(s)

No response