apache / parquet-format

Apache Parquet Format
https://parquet.apache.org/
Apache License 2.0
1.69k stars 422 forks source link

PARQUET-2435: Clarify behavior of DELTA_BINARY_PACKED encoding #231

Closed etseidl closed 4 months ago

etseidl commented 4 months ago

Provide some guidance around the issue of how many bits may be used when encoding DELTA_BINARY_PACKED data.

Jira

Commits

Documentation

etseidl commented 4 months ago

Thank you for the comments @wgtmac @pitrou @mapleFU. I've added the prohibition language. If there is consensus on forbidding the use of extra bits, then I can remove the long paragraph.

tustvold commented 4 months ago

I am likely missing some context here, but I would agree with @pitrou that an encoder producing data with more bits than the physical type is a bug in the encoder, and not to mention sub-optimal

pitrou commented 4 months ago

The latest proposed changes look fine to me. I'll let others chime in before potentially merging.