JuliaGeo / LibGEOS.jl

Julia package for manipulation and analysis of planar geometric objects
MIT License
72 stars 23 forks source link

Why do we use `cloneGeom` for every output? #196

Closed rafaqz closed 10 months ago

rafaqz commented 10 months ago

We often seem to do this multiple times. Clearly we can remove some of these double clones. But do we need to clone at all? I cant see why...

This one seems like an obvious unnecessary copy:

https://github.com/JuliaGeo/LibGEOS.jl/blob/feb0055bebe72fed41ac67363c91a9bc5350e90b/src/geos_functions.jl#L1416

Because then we copy again here:

https://github.com/JuliaGeo/LibGEOS.jl/blob/feb0055bebe72fed41ac67363c91a9bc5350e90b/src/geos_types.jl#L179

Removing these copies makes everything much faster. This is our GeometryOps.jl area method` (with a few branches checked out) on LG polygon. We don't need LibGEOS, its just for testing. But its a good example:

https://github.com/asinghvi17/GeometryOps.jl/issues/32

With cloneGeom

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 151 evaluations.
 Range (min … max):  623.709 ns …  5.664 ms  ┊ GC (min … max): 0.00% … 5.21%
 Time  (median):       1.048 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.340 μs ± 68.149 μs  ┊ GC (mean ± σ):  2.31% ± 0.08%

     █ ▄▅    ▃                                                  
  ▃▂▄█▇███▅▆▇█▇▄▃▃▃▄▃▄▃▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▂▂▂▂▂▂ ▃
  624 ns          Histogram: frequency by time         3.59 μs <

Removing the first cloneGeom:

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 175 evaluations.
 Range (min … max):  438.794 ns …  4.472 ms  ┊ GC (min … max): 0.00% … 6.64%
 Time  (median):     757.680 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.201 μs ± 70.767 μs  ┊ GC (mean ± σ):  3.27% ± 0.10%

  █▅ ▇▅   ▁    ▁                                                
  ██▄██▃▂▇█▇▆▂▃█▇▆▄▃▂▃▃▃▂▃▅▅▄▄▄▄▄▄▅▅▅▅▄▄▄▄▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁ ▃
  439 ns          Histogram: frequency by time         1.73 μs <

 Memory estimate: 80 bytes, allocs estimate: 3.

And the first and second cloneGeom

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 189 evaluations.
 Range (min … max):  391.788 ns …  3.552 ms  ┊ GC (min … max): 0.00% … 5.34%
 Time  (median):     661.643 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.391 μs ± 46.849 μs  ┊ GC (mean ± σ):  2.62% ± 0.08%

  █▄         ▂          ▁                                       
  ██▂▂▁▂▄▃▂▂▂██▅▄▅▃▅▄▇█████▇▆▅▆▅▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  392 ns          Histogram: frequency by time         1.16 μs <

 Memory estimate: 48 bytes, allocs estimate: 2.

And finally if I remove the one one the line above when a LineString rewraps another LineString:

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 337 evaluations.
 Range (min … max):  256.193 ns …  10.465 μs  ┊ GC (min … max): 0.00% … 96.32%
 Time  (median):     260.564 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   277.483 ns ± 167.507 ns  ┊ GC (mean ± σ):  1.02% ±  1.67%

  ▅█▄▅▄▄▃▂▁  ▁        ▂▁ ▂  ▁▃  ▂▁                              ▂
  ████████████▇▇██▇█▇█████▇▆██████▇▇█▇▅▅▇▇▃▅▄▇▃▃▄▃▆▃▁▁▃▇▃▁▁▄▄▅▅ █
  256 ns        Histogram: log(frequency) by time        418 ns <

 Memory estimate: 48 bytes, allocs estimate: 2.

We shouldn't change this one, but it shouldn't be happening in getgeom anyway.

evetion commented 10 months ago

Yeah, this is something I encountered as well and partially fixed for points, just like you in the other PR.

There are double inits, as getX already is a X. Most constructors do clone if given the chance.

Clonegeom is only needed on init with another existing geometry.

evetion commented 10 months ago

However, we must be careful. Your exteriorring instance must be cloned per https://libgeos.org/doxygen/geos__c_8h.html#afc5d54a2905996833747c2d566da2af3.

evetion commented 10 months ago

I will make a PR

rafaqz commented 10 months ago

I wondered... instead of cloning could we skip the finaliser on polygon rings?

The important thing is not to delete them?

(I guess you could get a ring then delete its parent polygon and trigger a segfault)

evetion commented 10 months ago

Are you proposing we leak memory? ;) Everything we create in GEOS needs to be destroyed again.

rafaqz commented 10 months ago

We arent creating anything with polygon rings, and theyre attached to the polygons finaliser. Right? Why else should we never delete them?

evetion commented 10 months ago

Ah, I misunderstood. You meant like a view or similar? The problem is that Julia doesn't know the ring is linked to the Polygon, so it could garbage collect the Polygon, invalidating the ring.

We could make structs like prepared geometry, linking back to the Polygon object?

Anyway, not sure how much performance you gain by it.

rafaqz commented 10 months ago

I think its more than half the cost of loading a polygon. Probably we would need an GI.unsafe_getgeom method where users promise the objects dont escape and we GC.preserve the polygon around a closure.

One day when everything else is done 😅

evetion commented 10 months ago

Let's not burden GI with unsafe, but let's see what I can make with a GC.preserve, never used it extensively.

evetion commented 10 months ago

Well, with the following code (including a third argument to the constructor to run a finalizer or not), the tests run, but I can trigger segfaults. Any ideas?

GC.@preserve obj begin
    return LinearRing(result, context, false)
end

But this segfaults:

julia> import LibGEOS as LG

julia> ring = LG.exteriorRing(pLG)^C

julia> pLG = LG.Polygon([
           [
               [0.0, 5.0], [2.0, 2.0], [5.0, 2.0], [2.0, -2.0], [5.0, -5.0],
               [0.0, -2.0], [-5.0, -5.0], [-2.0, -2.0], [-5.0, 2.0], [-2.0, 2.0],
               [0.0, 5.0],
           ],
       ])
POLYGON ((0 5, 2 2, 5 2, 2 -2, 5 -5, 0 -2, -5 -5, -2 -2, -5 2, -2 2, 0 5))

julia> ring = LG.exteriorRing(pLG)
LINEARRING (0 5, 2 2, 5 2, 2 -2, 5 -5, 0 -2, -5 -5, -2 -2, -5 2, -2 2, 0 5)

julia> pLG = nothing

julia> GC.gc()

julia> ring
# Segfaults
rafaqz commented 10 months ago

Thats why we need a method with a promise the contents dont escape the closure! I dont think it can work otherwise

evetion commented 10 months ago

Or something like this, with AbstractLinearRing replacing LinearRing in most signatures.


abstract type AbstractLinearRing <: AbstractGeometry end

struct LinearRingView <: AbstractLinearRing
    ptr::GEOSGeom
    context::GEOSContext
    parent::AbstractGeometry
    function LinearRingView(
        obj::GEOSGeom,
        parent::AbstractGeometry,
        context::GEOSContext = get_global_context(),
    )
        new(obj, context, parent)
    end
end

mutable struct LinearRing <: AbstractLinearRing
rafaqz commented 10 months ago

Wont it still segfault if you delete the polygon?

evetion commented 10 months ago

Nope, because the polygon is linked in the parent field.