Snapshot-Level Metrics and Statistics

Assume a table with the following field:

id int

There are n data files, each of which has the statistics min(id) and max(id). ids are positive integers.

Querying by id < 0 would require an O_(n) run on all data files in the manifest, querying whether min(id) < 0 < max(id).

Aggregating metrics and/or statistics to the Snapshot level would reduce such scans from O_(n) (n being the number of data files) to O₍₁₎.

Netflix / iceberg