apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.31k stars 3.48k forks source link

[Python] Command line execution of PyArrow and submodules #43361

Open pitrou opened 2 months ago

pitrou commented 2 months ago

Describe the enhancement requested

Currently, attempting to execute PyArrow and its submodules either fails or (misleadingly) does nothing.

$ python -m pyarrow
/home/antoine/t/venv-3.10/bin/python: No module named pyarrow.__main__; 'pyarrow' is a package and cannot be directly executed
$ python -m pyarrow.csv
$ python -m pyarrow.json
$ python -m pyarrow.parquet
/home/antoine/t/venv-3.10/bin/python: No module named pyarrow.parquet.__main__; 'pyarrow.parquet' is a package and cannot be directly executed

Some of these at least could do something useful. Example from the Python stdlib:

$ python -m tarfile --help
usage: tarfile.py [-h] [-v] [--filter <filtername>] (-l <tarfile> | -e <tarfile> [<output_dir> ...] | -c <name> [<file> ...] | -t <tarfile>)

A simple command-line interface for tarfile module.

options:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose output
  --filter <filtername>
                        Filter for extraction
  -l <tarfile>, --list <tarfile>
                        Show listing of a tarfile
  -e <tarfile> [<output_dir> ...], --extract <tarfile> [<output_dir> ...]
                        Extract tarfile into target dir
  -c <name> [<file> ...], --create <name> [<file> ...]
                        Create tarfile from sources
  -t <tarfile>, --test <tarfile>
                        Test if a tarfile is valid

Component(s)

Python

pitrou commented 2 months ago

@jorisvandenbossche @danepitkin

jorisvandenbossche commented 1 month ago

Some of these at least could do something useful.

You are imagining eg python -m pyarrow.parquet <file> to have some option to print the content of the file or the schema (and same for the other formats) ?

pitrou commented 1 month ago

Right. Or perhaps a more sophisticated CLI if we are so inclined :-)