The figures below show results from quick and naive benchmarks that compare the total (single thread) execution times for a few functions in shapely 2.0 vs. s2shapely. All benchmarks are run using 10 000 random x,y (lat,lon) points.
(Note: for all functions except "s2shapely.is_geography", the access to the wrapped C/C++ geo objects is almost direct -- in s2shapely it doesn't go through pybind11's complex conversion logic, see https://github.com/benbovy/s2shapely/issues/3#issuecomment-1332375997 and #5).
It is hardly comparable (different C/C++ libraries, different binding approaches), but it already highlights a few things:
The overhead caused by pybind11's "default" C++ <-> Python conversion is large (i.e., large difference measured for "is_geo"). This could be improved with some workarounds. It clearly has an impact on trivial functions (1st figure) but less so for more computationally expensive tasks (2nd figure).
The execution of the vectorized inner loop looks relatively similar for the two libraries, as shown by all trivial functions (except "is_geo") in the 1st figure. I guess that native Numpy ufuncs (this is what shapely provides, right?) are a bit more optimized than pybind11:vectorize. Maybe using xtensor(-python) xt::vectorize and xt::pyvectorize could provide some speed-up?
The results shown in the 2nd figure (equals / intersects predicates) are weird. This should be explained mostly by the fact that the underlying libraries (GEOS vs. s2geometry / s2geography) are very different from each other. Those are very naive and incomplete benchmarks, though. For s2shapely there no difference between unprepared / prepared geometries, but I suspect that this is because those are all point geometries.
The figures below show results from quick and naive benchmarks that compare the total (single thread) execution times for a few functions in shapely 2.0 vs. s2shapely. All benchmarks are run using 10 000 random x,y (lat,lon) points.
(Note: for all functions except "s2shapely.is_geography", the access to the wrapped C/C++ geo objects is almost direct -- in s2shapely it doesn't go through pybind11's complex conversion logic, see https://github.com/benbovy/s2shapely/issues/3#issuecomment-1332375997 and #5).
It is hardly comparable (different C/C++ libraries, different binding approaches), but it already highlights a few things:
The overhead caused by pybind11's "default" C++ <-> Python conversion is large (i.e., large difference measured for "is_geo"). This could be improved with some workarounds. It clearly has an impact on trivial functions (1st figure) but less so for more computationally expensive tasks (2nd figure).
The execution of the vectorized inner loop looks relatively similar for the two libraries, as shown by all trivial functions (except "is_geo") in the 1st figure. I guess that native Numpy ufuncs (this is what shapely provides, right?) are a bit more optimized than
pybind11:vectorize
. Maybe using xtensor(-python)xt::vectorize
andxt::pyvectorize
could provide some speed-up?The results shown in the 2nd figure (equals / intersects predicates) are weird. This should be explained mostly by the fact that the underlying libraries (GEOS vs. s2geometry / s2geography) are very different from each other. Those are very naive and incomplete benchmarks, though. For s2shapely there no difference between unprepared / prepared geometries, but I suspect that this is because those are all point geometries.