georust / geo

Geospatial primitives and algorithms for Rust
https://crates.io/crates/geo
Other
1.49k stars 194 forks source link

simplify_vw_preserve causing self-intersecting ring #1049

Open audunska opened 12 months ago

audunska commented 12 months ago

We have some geometry code which would benefit from avoiding self-intersections after simplification. I tried to switch from the rdp algorithm to the simplify_vw_preserve function, and wrote some property tests. After fuzzing a bit, quickcheck came up with this counterexample:

use geo::polygon;
use geo::SimplifyVwPreserve;
let pol = polygon![
    (x: 1., y: 4.),
    (x: 3., y: 4.),
    (x: 1., y: 1.),
    (x: 7., y: 0.),
    (x: 1., y: 0.),
    (x: 0., y: 1.),
    (x: 1., y: 4.),
];
let simplified = pol.simplify_vw_preserve(&2.25);
assert_eq!(simplified, polygon![
    (x: 1., y: 4.),
    (x: 3., y: 4.),
    (x: 1., y: 1.),
    (x: 7., y: 0.),
    (x: 1., y: 0.),
    (x: 1., y: 4.),
]);

This simplified object has a self-intersection at (x: 1., y: 1.).

Tested on geo version 0.25 and 0.26.

audunska commented 12 months ago

If I cyclically permute the coordinates, there is no self-intersection. So this is probably related to handling of the exterior being a linear ring.

Edit: Nope, one more cyclic permutation makes the issue reappear. I think the first cyclic permutation just made the algorithm not consider that particular triangle in question. So the bug is in the algorithm itself.

audunska commented 12 months ago

I guess the cartesian_intersect function does not detect intersections with endpoints of lines?

urschrei commented 12 months ago

Thanks for catching the possible source of the problem @audunska. I'll take a look as soon as I can.

urschrei commented 12 months ago

Do you have a specific example of an endpoint intersection detection miss? I'd like to test it against some variations in the algorithm.

audunska commented 12 months ago

You can use the points from the simplified geometry above:

l1 = Line::new((1., 1.), (7., 0.));
l2 = Line::new((1., 0.), (1., 4.));
assert!(cartesian_intersects(l1.start_point(), l1.end_point(), l2.start_point(), l2.end_point()));

The point (1., 1.) of the first line touches the second line.

audunska commented 12 months ago

On a side note, I ended up implementing the algorithm described in this paper instead, but optimized using an r* tree in the same way as SimplifyVwPreserve: https://doi.org/10.1016/j.cageo.2013.08.011

Would there be interest in contributing that implementation, maybe?

urschrei commented 11 months ago

I've tuned the algorithm a bit to:

Could you use my branch (https://github.com/urschrei/geo/tree/main) for your prop tests? I'm not 100% sure if this will work as a fix, but let's see.

In answer to your topology-preserving RDP impl: yes, we'd be delighted with a contribution!

urschrei commented 11 months ago

(My fix passes our existing tests and your specific test, to be clear)

audunska commented 11 months ago

Got a failure with this linestring on your branch:

let ls = line_string![
    (x: -4., y: 3.),
    (x:  4., y: 3.),
    (x:  1., y: 2.),
    (x: 20., y:-3.),
    (x:  0., y:-3.),
    (x:  0., y: 2.),
    (x: -4., y: 3.),
];
let expected_simplified = line_string![
    (x: -4., y: 3.),
    (x: 20., y:-3.),
    (x:  0., y:-3.),
    (x:  0., y: 2.),
    (x: -4., y: 3.),
];
assert_eq!(ls.simplify_vw_preserve(&6.25), expected_simplified);
urschrei commented 11 months ago

As far as I can see from testing, the current topology-preserving algorithm can't deal with this case:

The topology-preserving logic tries to fix intersections by removing the previous point (p -1) in the line if possible if p causes an intersection. However, in this case that doesn't work because the point causing the intersection (1, 2) which has index 2 has two adjacent triangles it can check for removal candidates: (-1, 0, 3) and (0, 3, 4). The first can't be used because index 0 is an edge, and the second is skipped because its index (3) is higher than the current point's (2), which means it's the next point, not the previous one.

But even if that weren't the case, the "problem" retained point has index 5, and the simplification algorithm never looks at it because its triangle (4, 5, 6) area is too large (10). This is why this input also fails using the standard VW algorithm.

JTS outputs the correct LineString (0, 3, 4, 6) for that tolerance (tested using TestBuilder), but I don't know how its topology-preservation logic is implemented…

Update: JTS uses epsilon ^ 2 in its VW algorithm, which led me to think that its topology-preservation logic was being used, when in fact it was simply removing the vertex because the triangle area is below the tolerance (39.0625). GEOS and PostGIS produce the same valid linestring / invalid polygon as geo.

urschrei commented 11 months ago

Thinking about this more, the reason the preservation process is failing in the latest case is because of the rule preventing removal of a triangle that forms part of the geometry's "edge". If this case were part of a larger geometry (i.e. there are non-removed points preceding index 0 (Point(-4, 3)), that stopping rule wouldn't be applied and Point(20, -3) would be removed, removing that self-intersection.

As quickcheck tries to find the most simple counter-examples it can, it's going to continue trying small geometries, which are simply far more likely to hit a stopping rule than larger "real world" geometries. This particular case can't be fixed – there aren't enough points left to remove when the intersection occurs.

audunska commented 11 months ago

Sure! I guess it just means that this algorithm tries hard to preserve topology, but can't quite guarantee it. Should probably document that fact :)