Open H-Plus-Time opened 3 days ago
Hi! Thanks for reporting this issue!
The problem seem to be that the range rewrite does not produce a IE-Join like intended but still remains a NL-join. It could be that something changed duckdb side that made the planning differ in the latest version.
Ill investigate.
What d you observe if you set "SET prefer_range_joins=true"?
Mm, I noticed joins involving Polygons (still stored as Geometry, but presumably those are auto-cast to duckdb Polygon_2D internally) were correctly using IE-Joins (of course, much better hit rate given their bounding boxes were a fair bit closer to their convex hull), though still paired with a Filter(ST_Contains(...)).
No luck with SET prefer_range_joins=true
, I still get an NL-join.
NB: The binding errors present in 0.9.0 -> 0.9.2 prevent this from being checked for those versions, so exactly when this started is a little fuzzy.
For a query along the lines of (note: for v0.8.1, the geom column was
wkb_geometry::GEOMETRY
):where locations is ~1k points, and catchment_geojson is a single-row table of multipolygon geometries, execution time jumps from 0.02s to 22.83s.
With the addition of
SET disabled_optimizers = 'extension';
, execution time drops back down to that observed in v0.8.1 (with an identical physical plan). Semi-joins (appropriate in scenarios like this) also produce the same BNL-centric physical plan (as opposed to the NL Join + Filter plan). An inner join against an exploded (~280 polygons) version of the multipolygon brings total execution time down to 1.6s.By the looks of the Range Join rewriting logic at https://github.com/duckdb/duckdb_spatial/blob/v1.0.0/spatial/src/spatial/core/optimizer_rules.cpp#L33 , the factors that produce this kind of perf degradation are:
Profiling
Before (v0.8.1):
After (v0.10.0):
Query plan diagnostics
Before (v0.8.1):
After (v0.10.0):