apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
681 stars 482 forks source link

orc-tools unknown subcommand "Merge" #1920

Closed echoeslove closed 4 months ago

echoeslove commented 5 months ago

follow https://orc.apache.org/docs/java-tools.html

execute

java -jar orc-tools-2.0.0-uber.jar merge

display

Unknown subcommand: merge
dongjoon-hyun commented 5 months ago

Ya, thank you for reporting.

It will be delivered via ORC 2.0.1 in two weeks.

https://issues.apache.org/jira/browse/ORC-1644

dongjoon-hyun commented 4 months ago

This will be resolved via #1928 .

cxzl25 commented 4 months ago

Now you can use 2.0.1 version.

wget https://repo1.maven.org/maven2/org/apache/orc/orc-tools/2.0.1/orc-tools-2.0.1-uber.jar

prepare data

echo -e "1,foo\n2,bar" > 1.csv
echo -e "3,apache\n4,orc" > 2.csv
java -jar orc-tools-2.0.1-uber.jar convert 1.csv --schema  'struct<c1:int,c2:string>'  --output 1.orc
java -jar orc-tools-2.0.1-uber.jar convert 2.csv --schema  'struct<c1:int,c2:string>'  --output 2.orc

use merge command

java -jar orc-tools-2.0.1-uber.jar merge --output merge.orc 1.orc 2.orc
Output path: merge.orc, Input files size: 2, Merge files size: 2

dump

java -jar orc-tools-2.0.1-uber.jar meta -d merge.orc
Processing data file merge.orc [length: 546]
{"c1":1,"c2":"foo"}
{"c1":2,"c2":"bar"}
{"c1":3,"c2":"apache"}
{"c1":4,"c2":"orc"}