apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.49k stars 1.37k forks source link

Clarify what "page index" means in Parquet.thrift #2910

Closed asfimport closed 1 month ago

asfimport commented 1 month ago

I have always found it very confusing that people refer to the term "page index" when referring to parquet, for example https://lists.apache.org/thread/o9nxbmv1z4hph3v5s2z63jsklywpkyyj

However, the term "page index" is not used in the the parquet thirft file https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift

The term does appears as the name of the file that describes the ColumnIndex spec.

https://github.com/apache/parquet-format/blob/master/PageIndex.md

 

I would like to clarify that ColumnIndex is the implementation of the Page index concept

 

Reporter: Andrew Lamb / @alamb Assignee: Andrew Lamb / @alamb

PRs and other links:

Note: This issue was originally created as PARQUET-2480. Please see the migration documentation for further details.