JuliaGeo / Shapefile.jl

Parsing .shp files in Julia
http://juliageo.org/Shapefile.jl/
MIT License
82 stars 14 forks source link

Polygon performance #109

Closed rafaqz closed 6 months ago

rafaqz commented 6 months ago

For Shapefile.Polygon this PR speeds up looping over GI.getgeom(poly, i) by ~100x for the first run and ~1000x after that (for very large shape files at least!)

  1. It caches the ring order as a Vector{Vector{Int}} and only calculates it on the first getgeom.
  2. It checks Extents.intersects before actually checking _inring for another more modest speedup.

It specifically does not eagerly fill the cache on construction, because some use cases don't need it, like GI.getring which returns them in the current order (which is 20x faster for e.g. calculating hulls or rasterization).

Users may also only want a subset of polygons, and this skips doing most of the work in that case.

@asinghvi17 if you want to review that would be great :)

(there is a tiny bit of type instability but this will be fixed in Extents.jl its just lack of inlining stopping const prop)

rafaqz commented 6 months ago

Ok its running all the test geometry checks twice. This should be good to go

evetion commented 5 months ago

Good to have this 👍🏻. Might be good to comment on how you go about finding these optimizations?Similar work probably is required for GeoArrow (once there) and other packages when we want the most performance.

rafaqz commented 5 months ago

How I found this was putting a println on the number of rings counted for each polygon in a dataset and loading a huge shapefile, then seeing the same number 50 times rather than once. I was trying to optimise something else at the time and it turned out to be irrelevent!

Not very fancy... but "print summary metrics while slow things run" is a good motto I guess