aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
636 stars 153 forks source link

[BUG]: empty string array incorrect serialization #481

Closed nd368 closed 1 week ago

nd368 commented 9 months ago

Library Version

4.23.4

OS

Windows

OS Architecture

64 bit

How to reproduce?

Actual behaviour: Empty string array serialized to a collection containing 1 NULL element

Expected behaviour: Empty string array serialized to a collection containing no elements

class Example
{
    public string[] EmptyStringArray { get; } = Array.Empty<string>();
}

[Test]
public async Task MinimalExample()
{
    var tempFile = Path.GetTempPath() + "example.parquet";
    using var fileStream = new FileStream(tempFile, FileMode.Create);

    var objectWithEmptyStringArray = new Example();
    await ParquetSerializer.SerializeAsync(new List<Example> { objectWithEmptyStringArray }, fileStream);

    fileStream.Close();
    Console.Write($"Parquet file: {tempFile}");
}

Open generated file in any parquet file viewer --> EmptyStringArray is a collection with 1 NULL element

Failing test

No response

aloneguid commented 1 month ago

Unfortunately it's a known issue with Parquet, different libraries will treat empty and 1 null differently as there's a bit of an issue in the standard.