aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 141 forks source link

Preallocate result list capacity when serializing #444

Closed Arithmomaniac closed 6 months ago

Arithmomaniac commented 6 months ago

The size of every row group is known from the metadata as soon as the reader is initialized. This can be used when creating the buffer list to prevent the need to optimize the list capacity (which involves creating and copying new array, and making sure the array is no larger than will be required.)

(You can't actually [currently] create a list with a capacity greater than int.MaxValue, but I wanted to delegate that exception to runtime as if this optimization was never made instead of throwing pre-emptively.)