Some parquet partitioning tools do not include the partitioned column in the partitioned parquet file, leaving the reader to interpret the column value from the path which is usually in the form column_name=column_value.
The changes here will let Parquet.Dataset handle such partitions. The default implementation handles most common formats used to encode partition information into partition file paths. It can be replaced/augmented by providing a column_generator function that can have custom logic to deduce column value given a partition and column name.
Some parquet partitioning tools do not include the partitioned column in the partitioned parquet file, leaving the reader to interpret the column value from the path which is usually in the form
column_name=column_value
.The changes here will let
Parquet.Dataset
handle such partitions. The default implementation handles most common formats used to encode partition information into partition file paths. It can be replaced/augmented by providing acolumn_generator
function that can have custom logic to deduce column value given a partition and column name.