Closed rafaqz closed 6 months ago
Ok its running all the test geometry checks twice. This should be good to go
Good to have this 👍🏻. Might be good to comment on how you go about finding these optimizations?Similar work probably is required for GeoArrow (once there) and other packages when we want the most performance.
How I found this was putting a println
on the number of rings counted for each polygon in a dataset and loading a huge shapefile, then seeing the same number 50 times rather than once. I was trying to optimise something else at the time and it turned out to be irrelevent!
Not very fancy... but "print summary metrics while slow things run" is a good motto I guess
For
Shapefile.Polygon
this PR speeds up looping overGI.getgeom(poly, i)
by ~100x for the first run and ~1000x after that (for very large shape files at least!)Vector{Vector{Int}}
and only calculates it on the firstgetgeom
.Extents.intersects
before actually checking_inring
for another more modest speedup.It specifically does not eagerly fill the cache on construction, because some use cases don't need it, like
GI.getring
which returns them in the current order (which is 20x faster for e.g. calculating hulls or rasterization).Users may also only want a subset of polygons, and this skips doing most of the work in that case.
@asinghvi17 if you want to review that would be great :)
(there is a tiny bit of type instability but this will be fixed in Extents.jl its just lack of inlining stopping const prop)