Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
476 stars 59 forks source link

Move snapshot out of metadata to a manifest file of data manifests #43

Closed rdblue closed 5 years ago

rdblue commented 6 years ago

Metadata files are large when the list of manifests grows. This could be solved by using a separate manifest file that tracks the manifests of data files.

A secondary benefit to this approach is that the partition data in the manifest files would show up as min/max stats in the snapshot manifest, allowing Iceberg to eliminate whole manifest files when planning a scan.

rdblue commented 5 years ago

This was implemented in https://github.com/apache/incubator-iceberg/commit/54f9a0ffaa0cc69a25818fcdfbc9b8bfc579fe67