apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
680 stars 166 forks source link

Able to parse name-mapping into a recusive structure. #723

Open Fokko opened 4 days ago

Fokko commented 4 days ago

Name mapping is used when the files in the table don't have field-IDs encoded in the Parquet files. For example, when adding files through add_files in the case of a table migration from Hive, the Parquet files don't have field-IDs in them. In this case we want to make use of name-mapping: https://iceberg.apache.org/spec/#name-mapping-serialization This is a JSON blob that's stored alongside the table in a table property.

This issue is solely on the deserialization of the JSON blob into a memory structure. Tests can be found here: https://github.com/apache/iceberg-python/blob/main/tests/table/test_name_mapping.py

Future tip: It is best to store this in a recursive field so it can be traversed using a VisitorWithParent where both a Schema and NameMapping can be traversed at once. This is important because we cannot flatten the name-mapping because of potential dots in the field name, and this disallows us to split between fields and subfields. This is done in PyIceberg here: https://github.com/apache/iceberg-python/pull/1014

barronw commented 3 days ago

Can I pick this up?

c-thiel commented 3 days ago

@barronw gladly! Assigned the issue to you. If there are any questions, just post them here or contact us on Slack :)