apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.63k stars 3.56k forks source link

[C++][Parquet] ParquetDataset should expose partition_base_dir #44765

Open nalyat opened 3 days ago

nalyat commented 3 days ago

Describe the enhancement requested

Currently ParquetDataset can receive partition as a parameter, but it's not possible to set a partition_base_dir. We need to expose this in the constructor parameters and pass it in the ds.dataset call.

This is useful when we're loading datasets structured with DirectoryPartitioning, and passing in a list of files

Component(s)

Parquet