While investigating a potential memory leak in my Azure Function app, I noticed that the Managed Memory Tool in Visual Studio tells me that I have many objects in the .NET managed heap. When looking through those objects (byte[]), they seem to have Microsoft.IO.RecyclableMemoryStreamManager as their root.
Parquet.Net uses RecyclableMemoryStreamManager in DataColumnWriter.
the RecyclableMemoryStreamManager will use the properties MaximumFreeSmallPoolBytes and MaximumFreeLargePoolBytes to determine whether to put those buffers back in the pool, or let them go (and thus be garbage collected). It is through these properties that you determine how large your pool can grow. If you set these to 0, you can have unbounded pool growth, which is essentially indistinguishable from a memory leak.
It seems that the DataColumnWriter does not set these properties, so I guess that might be the reason for my app's high memory usage.
Should those MaximumFreeSmallPoolBytes and MaximumFreeLargePoolBytes properties be somehow user configurable? Maybe via ParquetOptions?
While investigating a potential memory leak in my Azure Function app, I noticed that the Managed Memory Tool in Visual Studio tells me that I have many objects in the .NET managed heap. When looking through those objects (
byte[]
), they seem to haveMicrosoft.IO.RecyclableMemoryStreamManager
as their root.Parquet.Net uses
RecyclableMemoryStreamManager
inDataColumnWriter
.In it's documentation it states that
And also that:
It seems that the
DataColumnWriter
does not set these properties, so I guess that might be the reason for my app's high memory usage.Should those
MaximumFreeSmallPoolBytes
andMaximumFreeLargePoolBytes
properties be somehow user configurable? Maybe viaParquetOptions
?