aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 141 forks source link

[BUG]: Append serialize to file-path actually truncates file and then throws because it's empty #461

Closed danielearwicker closed 5 months ago

danielearwicker commented 5 months ago

Library Version

4.23.1

OS

All

OS Architecture

64 bit

How to reproduce?

Similar to the example in the docs:

await ParquetSerializer.SerializeAsync(data, filePath, new ParquetSerializerOptions { Append = false });
await ParquetSerializer.SerializeAsync(data, filePath, new ParquetSerializerOptions { Append = true });

SerializeAsync uses File.Create(filePath) which truncates the file, regardless of the Append flag.

Failing test

try
{
    await ParquetSerializer.SerializeAsync(
        new[]
        {
            new Record 
            {
                Timestamp = DateTime.UtcNow,
                EventName = "first",
                MeterValue = 1
            }
        }, 
        tempPath, 
        new ParquetSerializerOptions { Append = false });

    await ParquetSerializer.SerializeAsync(
        new[]
        {
            new Record 
            {
                Timestamp = DateTime.UtcNow,
                EventName = "second",
                MeterValue = 2
            }
        }, 
        tempPath,
        new ParquetSerializerOptions { Append = true });

    using ParquetReader reader = await ParquetReader.CreateAsync(tempPath);

    using (ParquetRowGroupReader reader0 = reader.OpenRowGroupReader(0))
    {
        Assert.Equal(1, reader0.RowCount);
    }

    using (ParquetRowGroupReader reader1 = reader.OpenRowGroupReader(1))
    {
        Assert.Equal(1, reader1.RowCount);
    }
}
finally 
{
    if (tempPath != null)
    {
        System.IO.File.Delete(tempPath);
    }
}
aloneguid commented 5 months ago

merged, please give it a go ;)