At my company we need to have more granularity regarding serialization and deserialization of row groups.
We use pools of objects to avoid instantiations and we multi-thread modifications of objects in those pools and parquet row group writing using DoubleBuffer (which use the mentionned pools).
This way we have fast and memory efficient parquet jobs.
We were using version 3 of this nuget, using reflection to access private methods of ClrBridge class to get fast and memory efficient serialization.
As serialization API implementation changed a lot, we cannot achieve the same on version 4.
So here is our contribution.
This adds a method to serialize a collection into a single row group.
This adds methods to deserialize a single row group into an existing collection.
This adds methods to deserialize row group per row group using IAsyncEnumerable.
At my company we need to have more granularity regarding serialization and deserialization of row groups.
We use pools of objects to avoid instantiations and we multi-thread modifications of objects in those pools and parquet row group writing using DoubleBuffer (which use the mentionned pools).
This way we have fast and memory efficient parquet jobs.
We were using version 3 of this nuget, using reflection to access private methods of
ClrBridge
class to get fast and memory efficient serialization.As serialization API implementation changed a lot, we cannot achieve the same on version 4.
So here is our contribution.
This adds a method to serialize a collection into a single row group. This adds methods to deserialize a single row group into an existing collection. This adds methods to deserialize row group per row group using
IAsyncEnumerable
.