Allow Specifying Partitioning Function for External Mappings

Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data

Apache License 2.0

478 stars 60 forks source link

(this is dependent upon the completion of #71 and #72)

The partition function for external mappings is derived from the parsing of the path of data files a-la Hive's format.

For instance the structure:

/date=2018-11-12/file.avsc
/date=2018-11-13/file.avsc

Would create a new column date with with string values 2018-11-12 and 2018-11-13 and assume the partitioning function is identity(date) instead of being able to derive it from another field (i.e. a function of the date part of a timestamp column).

Iceberg should let users specify their own partitioning function, based on existing columns.

Netflix / iceberg

Allow Specifying Partitioning Function for External Mappings #100