georust / geo

Geospatial primitives and algorithms for Rust
https://crates.io/crates/geo
Other
1.47k stars 190 forks source link

Polygon in Rect performance improvements #1192

Closed michel-kraemer closed 1 week ago

michel-kraemer commented 3 weeks ago

Before:

Rect contains polygon   time:   [2.3345 µs 2.3514 µs 2.3662 µs]
                        change: [-1.3280% -0.6558% -0.0257%] (p = 0.05 < 0.05)
                        Change within noise threshold.

After:

Rect contains polygon   time:   [3.7123 ns 3.7174 ns 3.7237 ns]
                        change: [-99.841% -99.840% -99.839%] (p = 0.00 < 0.05)
                        Performance has improved.

I have an application where I need to perform this check millions of times and this small improvement shaves off several seconds for me.

I think the same can be done for the other geometry types (LineString in Rect, MultiPolygon in Rect, etc.), but I wanted to get your feedback on this one first.

michaelkirk commented 3 weeks ago

I suspect you're right that some of our predicates could be optimized for Rects, but Contains is a notoriously tricky predicate.

A couple of cases that seem like they might be wrong with your implementation:

  1. What about an empty polygon? By definition at least one point of P needs to be in the interior of R. That seems easy enough to guard against.

  2. More tricky: What about something like P = R.to_polygon(). I'd expect your implementation to return false, because you're only checking the polygons vertices, and all the Polygon's vertices are in the boundary of R, with none in the interior of R.

  3. Edit: another degenerate case to watch out for would be a flat P, living in the boundary of R. I think currently you're handling this, but be sure not to break it if you find a way to address point 2.

michel-kraemer commented 3 weeks ago

Thank you so much! This is exactly the kind of feedback I was hoping for! :+1:

You're right that my first implementation was inconsistent to what relate does. I did not consider the empty case, and due to a typo in one of my unit tests, I was under the impression that geometries have to lie completely inside the interior of the rectangle.

I had a look at the DE-9IM again and AFAICT I've now implemented it correctly.

  1. None of the polygon's points may lie outside the rectangle (i.e. there must be no intersection between the rectangle's exterior and the polygon's boundary)
  2. The polygon must not completely lie inside the rectangle's boundary (i.e. the interiors of the rectangle and the polygon must intersect). Since we know that the rectangle is convex, it is enough to make sure that either at least one of the polygon's points lie inside the rectangle's interior or the polygon has an area != 0.

The new implementation is slightly slower than my first try, but still much faster than the original one:

Before:

Rect contains polygon   time:   [2.3159 µs 2.3249 µs 2.3345 µs]
                        change: [-1.3476% -0.8085% -0.3250%] (p = 0.00 < 0.05)
                        Change within noise threshold.

After:

Rect contains polygon   time:   [7.4262 ns 7.4409 ns 7.4631 ns]
                        change: [-99.679% -99.677% -99.676%] (p = 0.00 < 0.05)
                        Performance has improved.
michaelkirk commented 3 weeks ago

I wonder if using p.bounding_rect().coords_iter() instead of p.exterior_coords would eke out a little more performance? Probably not much more, but maybe a little. It also might be a good way to generalize a solution for all the other geometries too.

edit: For the record, I'd be more than happy to merge this as is, but just food for thought if you're thinking about pursuing more work like this.

michel-kraemer commented 3 weeks ago

Thanks for the review. I've integrated your comments.

I wonder if using p.bounding_rect().coords_iter() instead of p.exterior_coords would eke out a little more performance? Probably not much more, but maybe a little. It also might be a good way to generalize a solution for all the other geometries too.

Maybe ... but then we'd have to go through the points again to check if they lie inside the rectangle's interior. The way it is now, we can just use one for loop.

edit: For the record, I'd be more than happy to merge this as is, but just food for thought if you're thinking about pursuing more work like this.

Yes, it would be fun to optimize this predicate for the other geometry types as well. If it's not too much trouble for you, I'd prefer if you could merge this PR first. I'll create a new PR for the other geometries in the next few weeks.

urschrei commented 2 weeks ago

This is a delightful improvement!