JuliaGeo / LibGEOS.jl

Julia package for manipulation and analysis of planar geometric objects
MIT License
72 stars 24 forks source link

LibGEOS.Polygon Memory Leak #184

Closed SBuercklin closed 8 months ago

SBuercklin commented 1 year ago

The following codeblock allocates memory for every call to f and never seems to free the memory associated with each LG.Polygon. Manually inserting GC.gc() or empty!(v) calls on every iteration doesn't free the underlying memory.

Running this interactively alongside top, I see memory consumption increase with every call to f() for the active Julia process.

using LibGEOS
const LG = LibGEOS

function f(N = 1_000_000)
    _pts = Vector{Float64}[[1, 0],[2, 0],[2,1], [1,1], [1,0]]
    _v = LG.Polygon[]
    for _ in 1:N
        push!(_v, LG.Polygon([_pts]))
    end

    return _v
end

for i in 1:20
    v = f();
end
visr commented 1 year ago

Thanks for this example. We should not be leaking memory here. Just to copy over some of @evetion's findings on Slack:

Here we register a finalizer that should let GEOS destroy the geometry to free up the memory when it goes out of scope: https://github.com/JuliaGeo/LibGEOS.jl/blob/v0.8.5/src/geos_types.jl#L216-L219

The finalizer does get called. And GEOS takes ownership of the memory of the intermediate createLinearRing. Quoting @evetion:

If you do own the pointer and try to clean it up, you segfault on GC.gc().

It seems the memory goes to GEOS, but it's not yet clear where, what we are missing. It might be interesting to check if this is specific to Polygon or which other geometries are affected.

evetion commented 8 months ago

I fear that this is just bad GC. With enough GC calls around, memory is reclaimed. Calling this as a script, no leak is detected with tools like valgrind.

What I do see is that GC needs multiple passes to clean these arrays, because of the finalizers. It would be good to understand this more closely in 1.10.

evetion commented 8 months ago

Just did some more tests. Julia successfully cleans up the Polygon (and Context) objects, including the call to all finalizers. Memory usage drops to normal after the script exits, and memory leak tools detect no leaks.

What you see in top is not the same as actual memory used, it's more the total memory footprint that could/has been be used. If you allocate something of 80 bytes (10 float64 in _pts) 20 million times, you do hit the 1.6GB memory use. If you do it in steps, freeing the 1M Polygons in between, it depends on the OS/allocator whether it always tries to grow into new memory, or reclaims previously used memory. The previously used memory is not always given back to the OS. When I swap in jemalloc (a smarter memory allocator), I do see my RSS memory drop by 50% to something like 800MB (or 1300MB if I don't GC in every loop).

Part of this memory footprint is reclaimable by the OS, when under memory pressure to do so by other programs. The behaviour you describe is thus expected, and no memory leak.