aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
637 stars 153 forks source link

[BUG]: Parquet compression is not working inside a Linux container in Docker Desktop #563

Closed ppczouz closed 2 weeks ago

ppczouz commented 1 month ago

Library Version

5.0.0

OS

Ubunt Linux

OS Architecture

64 bit

How to reproduce?

  1. Added Parquet.net 5.0.0, and executed the following line: ParquetSerializer.SerializeAsync<EventData>(dataList, stream, new ParquetSerializerOptions() { CompressionMethod=Parquet.CompressionMethod.Snappy}); Where EventData is a class representing the schema, and Stream is an open stream on Azure Blob Storage

  2. Compile and run this as a console application on both Windows and WSL2 Ubuntu works perfect.

  3. Compile and run as Docker container in Docker Desktop (Linux), the above statement would only produce a parquet file containing the schema, but no data.

  4. Changing the compression methods yields to the same results. Only when setting compression to None does it include the data in the parquet file.

I do not see any exceptions or other items showing up on the log stream.

Failing test

No response

aloneguid commented 1 month ago

What base image are you using? I'm wondering if you are running this on musl runtime (Alpine etc.)

ppczouz commented 1 month ago

What base image are you using? I'm wondering if you are running this on musl runtime (Alpine etc.)

FROM mcr.microsoft.com/azure-functions/dotnet:4 AS base

Linux is buntu 24 Ubuntu 24.04.1 LTS (assuming Docker Desktop is using the WSL, which I explicitly configured it to do)

aloneguid commented 2 weeks ago

This should be sorted now, please check.

ppczouz commented 2 weeks ago

Awesome, thank you. Using 5.0.1, confirmed, it is now working properly.