Open omervk opened 5 years ago
Assume a table with the following field:
id int
There are n data files, each of which has the statistics min(id) and max(id). ids are positive integers.
n
min(id)
max(id)
id
Querying by id < 0 would require an O(n) run on all data files in the manifest, querying whether min(id) < 0 < max(id).
id < 0
min(id) < 0 < max(id)
Aggregating metrics and/or statistics to the Snapshot level would reduce such scans from O(n) (n being the number of data files) to O(1).
Assume a table with the following field:
There are
n
data files, each of which has the statisticsmin(id)
andmax(id)
.id
s are positive integers.Querying by
id < 0
would require an O(n) run on all data files in the manifest, querying whethermin(id) < 0 < max(id)
.Aggregating metrics and/or statistics to the Snapshot level would reduce such scans from O(n) (
n
being the number of data files) to O(1).