JuliaIO / Parquet.jl

Julia implementation of Parquet columnar file format reader
Other
112 stars 32 forks source link

fill missing partitioned col from partition path #142

Closed tanmaykm closed 3 years ago

tanmaykm commented 3 years ago

Some parquet partitioning tools do not include the partitioned column in the partitioned parquet file, leaving the reader to interpret the column value from the path which is usually in the form column_name=column_value.

The changes here will let Parquet.Dataset handle such partitions. The default implementation handles most common formats used to encode partition information into partition file paths. It can be replaced/augmented by providing a column_generator function that can have custom logic to deduce column value given a partition and column name.