amazon-ion / ion-cli

Apache License 2.0
31 stars 15 forks source link

Data missing when dumping gzip'd ion #126

Closed nirosys closed 2 months ago

nirosys commented 2 months ago

I have a gzip from one of our users that contains 2541 rows. When using the ion-cli to dump the data, it only produces 1213 rows, without error.

The issue seems to be with the gzip itself and how flate2 is decompressing it:

More investigation into what makes the archive different, and how we should properly handle it with flate2 is needed.

tgregg commented 2 months ago

It would be interesting to find out if the 1213 rows that are read successfully occur within the same compression frame, and if the remaining rows are contained in another frame (or frames) concatenated within the same file.

nirosys commented 2 months ago

Yep, that could definitely be it. The flate2 docs for GzDecoder say it can only handle a single member. Testing now with MultiGzDecoder.

nirosys commented 2 months ago

Fixed with #127, closing.