Closed alamb closed 1 month ago
I also think that currently PageIndex means "offset index and column index". They're all page-level index
Thank you for the comments @mapleFU @tustvold and @wgtmac - I believe I have implemented your suggestions and I think the PR is much clearer because of it.
@gszadovszky @pitrou @emkornfield @julienledem Would you like to take a look?
https://issues.apache.org/jira/browse/PARQUET-2480
See the proposed update as rendered markdown: https://github.com/alamb/parquet-format/blob/alamb/page-index/PageIndex.md
I have always found it very confusing that people refer to the term parquet "page index", for example this message
However, the term "page index" is not used in the the parquet.thrift file itself, but only appears as the name of the file that describes the
ColumnIndex
andOffsetIndex
, PageIndex.mdThis means I can't search for "page index" in the spec and find out what people are talking about
Proposed Clarifications
PageIndex.md
to clarify use the term "page index" and explain that it is encoded asColumnIndex
andOffsetIndex
ColumnIndex
andOffsetIndex
to include the term "page index" and clarify what those structures are used for.Jira
Commits
Documentation
This PR has no spec changes, only clarifications