apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
671 stars 477 forks source link

ORC-1644: Add `merge` tool to merge multiple ORC files into a single ORC file #1834

Closed cxzl25 closed 4 months ago

cxzl25 commented 4 months ago

What changes were proposed in this pull request?

This PR aims to add merge tool to merges multiple ORC files to produce a single ORC file.

Why are the changes needed?

In the ORC 1.3.0 version, the OrcFile#mergeFiles method was introduced by ORC-132 , which supports merging multiple ORC files into one ORC file. However, when merging, we need to write Java code to call it. There is no simple command that can be called directly.

How was this patch tested?

Add UT

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun commented 4 months ago

Thank you, @williamhyun .