aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 141 forks source link

Automatic Serialization multiples class with metadata #435

Closed jdedg closed 7 months ago

jdedg commented 7 months ago

Issue description

Hi, It's a long shot, so my client want me to send them parquet files. We have 75+ types of entities to send them base on messages we get, some have complex array parameters, null and all. I would love to simply use "await ParquetSerializer.SerializeAsync(dataList, memoryStream);" but my issue is the client want some metadata information using the "writer.CustomMetadata".

I know we are suppose to use the "low-level API" because of the Metadata, but creation one serialiser for each types is a bit much, specialy if we have more types to add in the future. So I would like to know if there is a way I did not see to have an autogeneration and metadata at the same time.

The reason is I did try to create my autogeneration base on "ParquetSerializer.SerializeAsync" but since "Striper" is not accessible there is a lot of logic to redo just to add metadata.

Thanks

aloneguid commented 7 months ago

Sorry for the delay. There isn't anything like that in the public API at the moment, but it should be trivial to add. Serializer uses the same low level api under the bonnet, so you should be able to add some method overload and inject it. Requires a bit of code change though.