apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
670 stars 477 forks source link

ORC-1647: Tips for supporting ORC in the `convert` command #1838

Closed cxzl25 closed 4 months ago

cxzl25 commented 4 months ago

What changes were proposed in this pull request?

This PR aims to add tips for supporting ORC in the convert command.

Why are the changes needed?

In the convert command, the source file format is supported to contain ORC, but this is not mentioned in the tools and documentation.

How was this patch tested?

local test

java -jar orc-tools-2.1.0-SNAPSHOT-uber.jar -h

Output

ORC Java Tools

usage: java -jar orc-tools-*.jar [--help] [--define X=Y] <command> <args>

Commands:
   convert - convert CSV/JSON/ORC files to ORC
   count - recursively find *.orc and print the number of rows
   data - print the data from the ORC file
   json-schema - scan JSON files to determine their schema
   key - print information about the keys
   meta - print the metadata about the ORC file
   scan - scan the ORC file
   sizes - list size on disk of each column
   version - print the version of this ORC tool

To get more help, provide -h to the command

Was this patch authored or co-authored using generative AI tooling?

No