apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.39k stars 2.21k forks source link

Long-running Spark rewrite Files Action may lead to OutOfMemoryError #11277

Open Zhanxiao-Ma opened 2 weeks ago

Zhanxiao-Ma commented 2 weeks ago

In my production environment, I have observed that long-running Spark rewrite Files Action can lead to OutOfMemoryError. Analyze the Java dump, I noticed a large number of ChildAllocator objects that are only referenced by the RootAllocator. Upon reviewing the code, I discovered that the ChildAllocator allocated at this point is indeed not being released. Is this correct? https://github.com/apache/iceberg/blob/cbb853073e681b4075d7c8707610dceecbee3a82/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedReaderBuilder.java#L59

Zhanxiao-Ma commented 2 weeks ago

I think the close method of VectorizedArrowReader should include the logic to release rootAlloc.