Add ColumnMetaData.schema_index. This is the ordinal in FileMetaData.schema this column corresponds to. This allows sparse representation of columns in a rowgroup.
Deprecate ColumnMetaData.encoding_stats and replace with ColumnMetaData.is_fully_dict_encoded.
[ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
Subject is separated from body by a blank line
Subject is limited to 50 characters (not including Jira issue reference)
Subject does not end with a period
Subject uses the imperative mood ("add", not "adding")
Body wraps at 72 characters
Body explains "what" and "why", not "how"
Documentation
[ ] In case of new functionality, my PR adds documentation that describes how to use it.
All the public functions and the classes in the PR contain Javadoc that explain what it does
ColumnMetaData.type
optionalColumnMetaData.path_in_schema
optionalColumnMetaData.schema_index
. This is the ordinal inFileMetaData.schema
this column corresponds to. This allows sparse representation of columns in a rowgroup.ColumnMetaData.encoding_stats
and replace withColumnMetaData.is_fully_dict_encoded
.ref Parquet Metadata evolution
Jira
Commits
Documentation