aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 140 forks source link

[BUG]: Deserialize, ArgumentOutOfRange exception #502

Open akaloshych84 opened 2 months ago

akaloshych84 commented 2 months ago

Library Version

4.23.5

OS

MacOS

OS Architecture

64 bit

How to reproduce?

Hello,

I am working on POC and try to process ROS generated data which is converted then to parquet in GCS, this .NET package is really cool, amazing performance. While it works for some topics (without nested repeated structs), it is failing on large complec data structures with multi level lists/structs. I found multiple error types on deserializing to C# classes, one is the same that was closed last year - destination is too short. Another one is ArgumentOutOfRange. I prepared a small test on which can be reproduced the second error type, another one is more difficult to reproduce, will try to prepare another test dataset and submit another ticket. Here is the example file with truncated schema to just a few fields but the ArgumentOutOfRange can be reproduced: 000000000000.parquet.zip

Error: image

Failing test

Class to deserialize to:

    public class HeaderStamp
    {
        public Int64? secs { get; set; }
        public Int64? nsecs { get; set; }
    }

    public class Header
    {
        public Int64? seq { get; set; }
        public HeaderStamp stamp { get; set; }
        public String frame_id { get; set; }
    }

    public class TrackedObjectsTestListElement
    {
        public long? track_id { get; set; }
        public Double? existence_probability { get; set; }
        public Boolean? moving { get; set; }
    }

    public class TrackedObjectsTest
    {
        public Header header { get; set; }
        public List<TrackedObjectsTestListElement> tracked_objects { get; set; }
        public String _launch_id { get; set; }
    }

And deserialize command:
var r = await ParquetSerializer.DeserializeAsync<TrackedObjectsTest>("000000000000.parquet");