apache / parquet-format

Apache Parquet Format
https://parquet.apache.org/
Apache License 2.0
1.69k stars 422 forks source link

Clarify behavior of DELTA_BINARY_PACKED encoders/decoders #426

Closed asfimport closed 4 months ago

asfimport commented 4 months ago

I brought this issue up on some time ago on the mailing list [1]; in short I would like to add some clarification to the DELTA_BINARY_PACKED section of Encodings.md.  The issue is that while the specification does not limit the number of bits that can be used to encode deltas, some readers expect a maximum of 32 bits for INT32 data, and 64 bits for INT64 data [2]. I propose adding verbiage to the specification to the effect that while using 33 bits to encode INT32 data (or 65 for INT64), it is not recommended, and that readers should be able to read such data, but are not required to.

 

 

[1] https://lists.apache.org/thread/2wj88oghc0t6qqj8ojp5p5tf8wg11840

[2] https://github.com/apache/arrow/issues/20374

Reporter: Edward Seidl / @etseidl Assignee: Edward Seidl / @etseidl

PRs and other links:

Note: This issue was originally created as PARQUET-2435. Please see the migration documentation for further details.

asfimport commented 4 months ago

Antoine Pitrou / @pitrou: Resolved by linked PR.