apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.48k stars 1.37k forks source link

PARQUET-781. Direct support for PathOutputCommitter #1361

Open steveloughran opened 1 month ago

steveloughran commented 1 month ago

If parquet.path.outputcommitter.enabled is true then it uses the PathOutputCommitterFactory mechanism to dynamically choose a committer for the output path. Such committers do not generate summary files; a warning about this is printed when appropriate

This significantly simplifies writing to s3/azure/gcs though committers which commit correctly and efficiently to the target stores.

Jira

Tests

No tests yet

Commits

Style

Documentation

steveloughran commented 1 month ago

Needs tests

Proposed: enable factory, enable summary, verify summaries are not created but the job works. manually review the output log too.