benbovy / spherely

Manipulation and analysis of geometric objects on the sphere.
https://spherely.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
119 stars 8 forks source link

ENH: Added more missing predicates #56

Closed JoelJaeschke closed 1 week ago

JoelJaeschke commented 1 month ago

This PR adds the remaining predicates from #17.

I would greatly appreciate your feedback @benbovy regarding covers and covered_by, as the way it is implemented now appears to mostly work, but fails when checking polygons that partly share a boundary. This is consistent with https://github.com/r-spatial/s2/blob/b495b0df53bffc7dc1ad3780fe8ac208e78cd1bf/R/s2-predicates.R#L121C1-L124C2 for example, but deviates from PostGIS IIUC (cmp. https://postgis.net/docs/ST_Covers.html). From a user perspective, I would expect the test to work.

benbovy commented 1 month ago

Thanks for the PR @JoelJaeschke !

I suspect that that R s2 deviating from PostGIS is not intentional? If we want a behavior consistent with PostGIS the implementation of covers should probably check both "s2_contains" and "s2_intersects" (not only "s2_contains").

I briefly checked in the S2geometry tests: for two polygons sharing one edge, the open vs. closed polygon model has no influence on the result of the DIFFERENCE Boolean operation (used by "s2_contains") but yields different results for the INTERSECTION operation.

(S2geometry also seems to assume that the edges of polylines - and polygons? - are directed by default and takes that into account for the intersection operation... this might also explain some specific behavior).

JoelJaeschke commented 1 month ago

Actually, what I thought was incorrect behavior is likely expected and working correctly. I was thinking of this from a planar geometry point of view, but since S2 assumes a spherical earth, the horizontal edges are not actually straight, but rather follow the earth's curvature. Therefore, even though in planar geometry, one polygon would be covered, in spherical geometry this is not true anymore.

I tested this in BigQuery (using the GeoViz tool) and the following query

select st_geogfromtext('polygon((-118 60, -118 23, 34 23, 34 60, -118 60))') p1, st_covers(st_geogfromtext('polygon((-118 60, -118 23, 34 23, 34 60, -118 60))'), st_geogfromtext('polygon((-118 60, -118 23, 34 23, 34 60, -118 60))'))
union all
select st_geogfromtext('polygon((-118 60, -118 23, -18 23, -18 60, -118 60))') p2, st_covers(st_geogfromtext('polygon((-118 60, -118 23, 34 23, 34 60, -118 60))'), st_geogfromtext('polygon((-118 60, -118 23, -18 23, -18 60, -118 60))'));

where in the second select, the polygon is similar to the way I wrote the tests. However, when visualizing the resulting geometries, this is the result: Screenshot from 2024-10-21 19-14-30 As can be seen, the red polygon is definitely not covered by the "parent" polygon. BigQuery also returns false for the second ST_Covers. So I think the test rightfully fails. Let me know if this makes sense to you @benbovy . Otherwise, I would simply remove the last test from the PR and it should be good to go.

benbovy commented 1 month ago

Ah yes that makes sense!

(BTW we should implement quick visualization of geography objects in Spherely as it helps a lot!)

Instead of simply removing the failing tests, could you please keep these and make sure that the boundaries of the two polygons partially overlap along a great circle (longitude)?

(I'm also curious to see if it works as expected when the overlapping boundaries correspond to a shell ring for polygon A and a hole ring for polygon B... I guess it should work?)

JoelJaeschke commented 3 weeks ago

Instead of simply removing the failing tests, could you please keep these and make sure that the boundaries of the two polygons partially overlap along a great circle (longitude)?

This has turned out to be much tougher than I anticipated. The standard S2BooleanOperation::Contains in s2geometry does not handle shared boundaries on polygons as I had hoped.. (well, it does, but not in a way that would help when implementing the covers logic, see https://github.com/google/s2geometry/issues/387).

I am not sure whether I grok the concepts in s2geometry fully, but as I understood, there is no API that handles the covers check natively. The S2Polygon class has a method called ApproxContains, which solves the issue of shared boundaries on polygons, but only when using an approximation, i.e. no exact checks. However, this would require some custom dispatching logic that differs between polygons and polylines/points and I am not quite sure whether this is something that should be implemented in spherely or rather in s2geography?

Below is some code that demonstrates the issue in more detail (only depends on s2geometry):

#include <iostream>
#include <memory>

#include <s2/s2latlng.h>
#include <s2/s2point.h>
#include <s2/s2loop.h>
#include <s2/s2polygon.h>
#include <s2/s2builder.h>
#include <s2/s2builderutil_s2polygon_layer.h>
#include <s2/s2boolean_operation.h>

int main(int argc, char** argv) {
    const std::vector<S2Point> parent_vertices = {
        S2LatLng::FromDegrees(60, -118).ToPoint(),
        S2LatLng::FromDegrees(23, -118).ToPoint(),
        S2LatLng::FromDegrees(23, 34).ToPoint(),
        S2LatLng::FromDegrees(60, 34).ToPoint()
    };
    std::unique_ptr<S2Loop> parent_loop = std::make_unique<S2Loop>(std::move(parent_vertices));
    S2Polygon parent_poly(std::move(parent_loop));

    const std::vector<S2Point> interior_vertices = {
        S2LatLng::FromDegrees(40, -117).ToPoint(),
        S2LatLng::FromDegrees(37, -117).ToPoint(),
        S2LatLng::FromDegrees(37, -116).ToPoint(),
        S2LatLng::FromDegrees(40, -116).ToPoint()
    };
    std::unique_ptr<S2Loop> interior_loop = std::make_unique<S2Loop>(std::move(interior_vertices));
    S2Polygon interior_poly(std::move(interior_loop));

    const std::vector<S2Point> shared_bound_vertices = {
        S2LatLng::FromDegrees(40, -118).ToPoint(),
        S2LatLng::FromDegrees(23, -118).ToPoint(),
        S2LatLng::FromDegrees(23, 34).ToPoint(),
        S2LatLng::FromDegrees(40, 34).ToPoint()
    };
    std::unique_ptr<S2Loop> shared_bound_loop = std::make_unique<S2Loop>(std::move(shared_bound_vertices));
    S2Polygon shared_bound_poly(std::move(shared_bound_loop));

    const std::vector<S2Point> crossing_vertices = {
        S2LatLng::FromDegrees(40, -120).ToPoint(),
        S2LatLng::FromDegrees(37, -120).ToPoint(),
        S2LatLng::FromDegrees(37, -116).ToPoint(),
        S2LatLng::FromDegrees(40, -116).ToPoint()
    };
    std::unique_ptr<S2Loop> crossing_loop = std::make_unique<S2Loop>(std::move(crossing_vertices));
    S2Polygon crossing_poly(std::move(crossing_loop));

    const auto contains_interior = S2BooleanOperation::Contains(parent_poly.index(), interior_poly.index()) ? "true" : "false";
    std::cout << "Contains interior? " << contains_interior << std::endl;

    const auto contains_shared = S2BooleanOperation::Contains(parent_poly.index(), shared_bound_poly.index()) ? "true" : "false";
    std::cout << "Contains shared? " << contains_shared << std::endl;

    const S1Angle tol = S1Angle::Degrees(1e-16);
    const auto approx_contains_shared = parent_poly.ApproxContains(shared_bound_poly, tol) ? "true" : "false";
    std::cout << "Approx contains shared? " << approx_contains_shared << std::endl;

    const auto contains_crossing = S2BooleanOperation::Contains(parent_poly.index(), crossing_poly.index()) ? "true" : "false";
    std::cout << "Contains crossing? " << contains_crossing << std::endl;

    const auto approx_contains_crossing = parent_poly.ApproxContains(crossing_poly, tol) ? "true" : "false";
    std::cout << "Approx contains crossing? " << approx_contains_crossing << std::endl;

    return 0;
}

Output when called is

Contains interior? true
Contains shared? false
Approx contains shared? true
Contains crossing? false
Approx contains crossing? false
jorisvandenbossche commented 3 weeks ago

@JoelJaeschke thanks for further diving into this!

Exploring this using R's s2 (given this is already implemented there), this is a reproducer of your example above I think (with the shared boundary that gives a wrong result):

# s2_make_polygon takes vector of longitudes and vector of latitudes
> parent_poly <- s2_make_polygon(c(-118, -118, 34, 34), c(60, 23, 23, 60))
> shared_bound_poly <- s2_make_polygon(c(-118, -118, 34, 34), c(40, 23, 23, 40))
> s2_contains(parent_poly, shared_bound_poly)
[1] FALSE
> s2_covers(parent_poly, shared_bound_poly)
[1] FALSE

and this indeed gives False while you would expect True for both cases (so this is actually not specific to covers being added here, but already an issue with contains as well, AFAIU)

Testing what has been said on the s2geometry, that this is a 50/50 chance of falling left or right of the edge, I tested with some other coordinates along the edge:

> for (lat_high in 40:30) {
+     shared_bound_poly <- s2_make_polygon(c(-118, -118, 34, 34), c(lat_high, 23, 23, lat_high))
+     print(paste(lat_high, s2_contains(parent_poly, shared_bound_poly), s2_covers(parent_poly, shared_bound_poly)))
+ }
[1] "40 FALSE FALSE"
[1] "39 FALSE FALSE"
[1] "38 TRUE TRUE"
[1] "37 FALSE FALSE"
[1] "36 FALSE FALSE"
[1] "35 TRUE TRUE"
[1] "34 FALSE FALSE"
[1] "33 FALSE FALSE"
[1] "32 FALSE FALSE"
[1] "31 FALSE FALSE"
[1] "30 TRUE TRUE"

Here it was not 50/50, but at least you see that indeed it is a bit (deterministically) random.

So while this is definitely a usability issue (and ideally we would be able to solve that. I thought that maybe snapping would have helped, but that didn't seem to make a difference ..), I would for now just implement this as you were doing, and document this as a gotcha (also given that R s2 is doing the same). And then we can later try to improve this.

Did you also test this example in BigQuery? (wondering if it can handle this case better ..)

For testing, we can use an example case where there is a shared boundary, but only fully shared edges (so there is never a vertex that falls somewhere in the middle of the edge of the other). For example:

> parent_poly <- s2_make_polygon(c(-118, -118, -118, 34, 34, 34), c(60, 40, 23, 23, 40, 60))
> shared_bound_poly <- s2_make_polygon(c(-118, -118, 34, 34), c(40, 23, 23, 40))
> s2_contains(parent_poly, shared_bound_poly)
[1] TRUE
> s2_covers(parent_poly, shared_bound_poly)
[1] TRUE
JoelJaeschke commented 3 weeks ago

Hey @jorisvandenbossche, thanks for your feedback.

So while this is definitely a usability issue (and ideally we would be able to solve that. I thought that maybe snapping would have helped, but that didn't seem to make a difference ..),

Behind the scenes, the ApproxContains function does use snapping with the given tolerance, if that is what you mean? Maybe, if this option is more exposed in higher-level APIs, that could be used? But in general, I agree that its probably easiest to just document this example and make the user aware.

Did you also test this example in BigQuery? (wondering if it can handle this case better ..)

I did, BigQuery does in fact handle this case properly, although I only tested this with one polygon and ST_Covers. But happy to re-test some more cases.

For testing, we can use an example case where there is a shared boundary, but only fully shared edges (so there is never a vertex that falls somewhere in the middle of the edge of the other).

Sounds good to me, I will push the changes

JoelJaeschke commented 3 weeks ago

I implemented all the feedback from above and documented the behavior in the tests a bit. If there is anything you would like me to change, let me know.

After thinking about this more, there is probably one way how this could work properly. Given two MutableS2ShapeIndex instances a and b, we would have to build the intersection of the two and then test the result for equality against b. If this is true, it should mean that b is fully covered by a.

I gave implementing this a go, but I struggle with how to make it work when just given two MutableS2ShapeIndex instances. Maybe someone who has more experience with s2geometry can take a stab at this.

JoelJaeschke commented 2 weeks ago

Hey @jorisvandenbossche, thanks a lot for all your feedback so far! Is there anything missing that I should add and/or change here?

jorisvandenbossche commented 1 week ago

To follow-up a bit more on those tricky corner case issues.

After thinking about this more, there is probably one way how this could work properly. Given two MutableS2ShapeIndex instances a and b, we would have to build the intersection of the two and then test the result for equality against b. If this is true, it should mean that b is fully covered by a.

Essentially, that is already how the predicates work in s2geometry (i.e. calculate the intersection or difference or .., and then assert something about the result). For example for the case of contains(a, b), it checks that the difference of b with a is empty:

  // Convenience method that returns true if A intersects B.
  static bool Intersects(const S2ShapeIndex& a, const S2ShapeIndex& b,
                         const Options& options = Options()) {
    return !IsEmpty(OpType::INTERSECTION, b, a, options);
  }

  // Convenience method that returns true if A contains B, i.e., if the
  // difference (B - A) is empty.
  static bool Contains(const S2ShapeIndex& a, const S2ShapeIndex& b,
                       const Options& options = Options()) {
    return IsEmpty(OpType::DIFFERENCE, b, a, options);
  }

(https://github.com/google/s2geometry/blob/0fb1b8a1474f3137c8b44a7861e31f8efd1370e2/src/s2/s2boolean_operation.h#L505-L516)

But that means we could indeed also tweak the exact check that happens, and do something custom (like you suggest).

As a first step, instead of changing the exact check (like using equals(intersection(a, b), b) instead of isempty(difference(b, a))), we also already have the ability to tweak the options for the boolean operation (the intersection or difference). We still need a way to specify that from Python (opened https://github.com/benbovy/spherely/issues/70 about that), but exploring the possibilities with this in R using the example from above:

> parent_poly <- s2_make_polygon(c(-118, -118, 34, 34), c(60, 23, 23, 60))
> shared_bound_poly <- s2_make_polygon(c(-118, -118, 34, 34), c(40, 23, 23, 40))
# covers gives False, while it is expected to be True
> s2_covers(parent_poly, shared_bound_poly)
[1] FALSE

# what is checked under the hood is whether the difference is empty
# but with the default options, for this specific case the difference is a line
> s2_difference(shared_bound_poly, parent_poly)
<geodesic s2_geography[1] with CRS=OGC:CRS84>
[1] LINESTRING (-118 40, -118 23, -118 40)
# and thus not empty
> s2_is_empty(s2_difference(shared_bound_poly, parent_poly))
[1] FALSE

# if we specify we only want to consider the polygonal output, the result becomes empty
> s2_difference(shared_bound_poly, parent_poly, options=s2_options(dimensions=c("polygon")))
<geodesic s2_geography[1] with CRS=OGC:CRS84>
[1] POLYGON EMPTY
> s2_is_empty(s2_difference(shared_bound_poly, parent_poly, options=s2_options(dimensions=c("polygon"))))
[1] TRUE

Now, I am not entirely sure whether it is then correct for other corner cases to only consider the polygonal difference/intersection in case both inputs are polygons for the contains and covers operations, but so that could be something to explore further.

jorisvandenbossche commented 1 week ago

Given that this is something that ideally should be solved in the s2geography layer (since this also affects R), I opened an issue over there to describe this specific case -> https://github.com/paleolimbot/s2geography/issues/44

So let's continue the discussion there.