aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 140 forks source link

[BUG]: Deserialising delta binary packed encoded data produces incorrect results #486

Closed swindiggie closed 4 weeks ago

swindiggie commented 3 months ago

Library Version

4.23.4

OS

MacOS 13.4

OS Architecture

ARM 64

How to reproduce?

I generated a parquet file using the following one-column CSV:

"column1"
1000
1
2
3
4
5
6
7
8
9
10

The single column in the parquet file is Int32 using DELTA_BINARY_PACKED encoding.

When I deserialise the parquet file using parquet-dotnet, I get:

"column1"
1000
1
-998
-1997
-2996
-3995
-4994
-5993
-6992
-7991
-8990

The incorrect results can be viewed using Parquet Floor 4.23.4.

I have also opened a PR with a failing unit test against the parquet file as a reference:

I verified a few other websites display correct results for the parquet file:

The CSV file and a screenshot of Parquet Floor are attached.

test.csv

Screenshot 2024-03-07 at 4 26 59 pm

Failing test

No response