G-Research / ParquetSharp

ParquetSharp is a .NET library for reading and writing Apache Parquet files.
Apache License 2.0
183 stars 49 forks source link

Fix reading required string values inside an optional group #481

Closed adamreeve closed 3 months ago

adamreeve commented 3 months ago

Fixes #480.

When string values are required and inside an optional nested group, the group is skipped over in LogicalBatchReaderFactory and we only use a single LeafReader instance to read values. This should read null into destination rows where the enclosing group is null. But when _nullableLeafValues is false in the BufferedReader, only the non-null values would be read so we'd reach the end of the values before reaching the expected number of rows/levels.

We don't have the same problem for a required int inside an optional group for example as this is detected in GetCompoundReader when typeof(TElement) != typeof(TLogical) and is handled specially with the OptionalReader class.