Closed rdblue closed 5 years ago
Instead of storing a single HDFS block size for each data file, Iceberg should store a list of split offsets. That will allow split planning to be more precise by using row group or stripe offsets, without reading file footers.
This issue has moved to the ASF repo: https://github.com/apache/incubator-iceberg/issues/37
Instead of storing a single HDFS block size for each data file, Iceberg should store a list of split offsets. That will allow split planning to be more precise by using row group or stripe offsets, without reading file footers.