georust / geo

Geospatial primitives and algorithms for Rust
https://crates.io/crates/geo
Other
1.51k stars 195 forks source link

Panic in BooleanOps::difference for Multipolygon #976

Open cassiepearson-deca opened 1 year ago

cassiepearson-deca commented 1 year ago

The BooleanOps::difference function panics about half the time when running the included test case due to a failure in the sweep algorithm. Similar datasets succeed without issue. The provided test case was reduced from a larger test case. I attempted to find individual polygons that failed but was unable to find a smaller input Multipolygon.

Test case: https://github.com/Deca-Technologies/geo_boolean_difference_panic

I expected to see the structures successfully difference'd returning the entire subject multipolygon. Instead, geo panics during the difference operation about 50% of the time. The code includes a for loop to run until the failure is hit.

Issue https://github.com/georust/geo/issues/913 may be related but I am not certain as it uses a different version of geo.

Backtrace:

thread 'main' panicked at 'segment not found in active-vec-set: 8', /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/sweep/vec_set.rs:22:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: core::result::unwrap_failed
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1791:5
   3: core::result::Result<T,E>::expect
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1070:23
   4: geo::algorithm::sweep::vec_set::VecSet<geo::algorithm::sweep::active::Active<T>>::index_of
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/sweep/vec_set.rs:20:9
   5: geo::algorithm::sweep::proc::Sweep<C>::handle_event
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/sweep/proc.rs:232:30
   6: geo::algorithm::sweep::proc::Sweep<C>::next_event::{{closure}}
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/sweep/proc.rs:46:13
   7: core::option::Option<T>::map
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:925:29
   8: geo::algorithm::sweep::proc::Sweep<C>::next_event
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/sweep/proc.rs:44:9
   9: <geo::algorithm::sweep::iter::CrossingsIter<C> as core::iter::traits::iterator::Iterator>::next
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/sweep/iter.rs:170:26
  10: geo::algorithm::bool_ops::op::Proc<T,S>::sweep
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/bool_ops/op.rs:68:30
  11: <geo_types::geometry::multi_polygon::MultiPolygon<T> as geo::algorithm::bool_ops::BooleanOps>::boolean_op
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/bool_ops/mod.rs:94:9
  12: geo::algorithm::bool_ops::BooleanOps::difference
             at /Users/cassie/.cargo/registry/src/github.com-1ecc6299db9ec823/geo-0.23.1/src/algorithm/bool_ops/mod.rs:39:9
  13: geo_boolean_difference_panic::main
             at ./src/main.rs:10:9
  14: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
ggunde-deca commented 1 year ago

Another test case is included in the tests folder in the above linked repository. This test case involves two polygons that overlap each other. After running the difference operation, the test fails, every time, throwing the exact same stack trace as above.

RobWalt commented 1 year ago

UPDATE

It seems to work quiet well for the combination of Triangle + f64! No panics so far and all the panics for the cases below (which were f32) are passing the tests. This is good since you can basically tessellate any (Multi)-Polygon, run the boolops on the resulting triangles and then reassemble the triangles into one (Multi)-Polygon result.

I know this doesn't prove anything, but I didn't run into panics for all operations on Triangle<f64>(..).to_polygon() which were randomly generated. The random tests covered about 100.000.000 cases and I think that's good enough for me to trust it.

Results

I ran my panic discovering test suite over night and it didn't panic once in 500 runs which amounts to

500 runs x 20 parallel tests x 100.000 attempts per test = 100.000.000 attempts total

Methods

Every test, a random boolop is chosen. Then it generates two random triangles and makes sure that they are intersecting before the actual boolop is run. This check is done since otherwise we wouldn't have anything interesting to do with the boolop.

Here is the code I used for testing:

#![allow(dead_code)]
#![allow(unused_results)]
#![allow(unused_must_use)]
#![allow(unused_variables)]

use geo::{BooleanOps, Intersects};
use geo_svg::ToSvg;
use rand::{thread_rng, Rng};

type T = f64;
const NUM_RUNS: usize = 100_000;
const RANGE: f32 = 10_000.0;

fn random_coord(rng: &mut impl Rng) -> geo::Coord<T> {
    let range = rng.gen_range(1.0..RANGE);
    let x = rng.gen_range(-range..range) as f64;
    let y = rng.gen_range(-range..range) as f64;
    geo::Coord { x, y }
}

fn random_coords(rng: &mut impl Rng) -> [Option<geo::Coord<T>>; 3] {
    (0..3)
        .map(|_| random_coord(rng))
        .enumerate()
        // check that points are unique and we don't have degenerate triangles (lines still
        // possible)
        .fold([None; 3], |mut vec, (i, p)| {
            if !vec.contains(&Some(p)) {
                vec[i].replace(p);
            }
            vec
        })
}

fn random_poly(rng: &mut impl Rng) -> geo::Polygon<T> {
    let mut cs = random_coords(rng);
    while cs.contains(&None) {
        cs = random_coords(rng);
    }
    let p = geo::Polygon::new(
        geo::LineString::new(cs.map(|x| x.unwrap()).to_vec()),
        vec![],
    );
    p
}

fn save_polys(ps: [&geo::Polygon<T>; 2], path: &str) {
    if std::path::Path::new(path).exists() {}
    let sps = serde_json::to_string(&ps).unwrap();
    std::fs::write(path, sps);
}

fn test_prog(f: impl Fn(&geo::Polygon<T>, &geo::Polygon<T>) -> geo::MultiPolygon<T>, name: &str) {
    let mut rng = thread_rng();
    let mut make = || random_poly(&mut rng);
    for i in 0..NUM_RUNS {
        //let name = format!("{name}-{i}.html");
        let mut p1 = make();
        let mut p2 = make();
        // make sure polys are intersecting
        while !p1.intersects(&p2) {
            p1 = make();
            p2 = make();
        }
        //save_polys([&p1, &p2], &name);
        f(&p1, &p2);
        //std::fs::remove_file(name);
    }
}

const FNS: [fn(&geo::Polygon<T>, &geo::Polygon<T>) -> geo::MultiPolygon<T>; 3] = [
    BooleanOps::difference,
    BooleanOps::union,
    BooleanOps::intersection,
];

macro_rules! make_test_thread {
    ($name:ident) => {
        paste! {
            #[test]
            fn [<test_thread_$name>]() {
                test_prog(FNS.choose(&mut thread_rng()).unwrap(), "thread");
            }
        }
    };
    ( $($name:ident),* ) => {
        $(
            make_test_thread!($name);
        )*
    };
}

use paste::paste;
use rand::seq::SliceRandom;

make_test_thread!(
    t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15, t16, t17, t18, t19, t20,
    t21, t22, t23, t24, t25, t26, t27, t28, t29, t30, t31, t32, t33, t34, t35, t36, t37, t38, t39
);
Old Analysis I tried to narrow that down a bit and went to create as minimal reproducing example as I could get. I noticed the following: - Other panics can happen for other boolops when used with `Polygon`/`Multipolygon` - ~~`union` and `intersection` seem to never be panicing for triangles!~~ Apparently even `union` and intersections also panic, see the update section below (Which is a good thing, since if you rely on it to work you can just tessellate bigger polys and do the boolops on the triangles and then reconstruct the poly from triangles) - `difference` panics even for triangles Let's have a look at the cases where the difference panics: (raw data in `serde_json::to_string(&ps)` format where `ps : [geo::Polygon;2]`) 1. panic with `assembly segments must be eulierian` Noticeable is that one corner of one triangle is almost located on a line of the other triangle ```json [{"exterior":[{"x":-9911.954,"y":-8725.639},{"x":-9915.633,"y":9687.057},{"x":-4963.603,"y":-45.043945},{"x":-9911.954,"y":-8725.639}],"interiors":[]},{"exterior":[{"x":-9007.182,"y":9094.508},{"x":-2687.1802,"y":-8199.999},{"x":-9915.442,"y":8069.0723},{"x":-9007.182,"y":9094.508}],"interiors":[]}] ``` ![image](https://github.com/georust/geo/assets/26892280/208297c1-bb16-4694-97d5-9765036b3905) 3.panic with `segment not found in active-vec-set: 2` I can't even get one of the two triangles to render with my svg renderer with this one ```json [{"exterior":[{"x":-5201.7114,"y":-8303.499},{"x":-5659.4683,"y":772.75977},{"x":-6016.6597,"y":7854.8574},{"x":-5201.7114,"y":-8303.499}],"interiors":[]},{"exterior":[{"x":5620.463,"y":-6589.673},{"x":-8996.825,"y":7323.99},{"x":8552.832,"y":8266.936},{"x":5620.463,"y":-6589.673}],"interiors":[]}] ``` ![image](https://github.com/georust/geo/assets/26892280/0091167a-01d3-4042-824d-f17eceff5353) 3. Panic with `internal error: entered unreachable code` One of the vertices seems to be close to an edge of the other triangles again ```json [{"exterior":[{"x":7145.463,"y":-9308.827},{"x":-5928.4663,"y":-9830.61},{"x":-6159.425,"y":6376.1953},{"x":7145.463,"y":-9308.827}],"interiors":[]},{"exterior":[{"x":-1481.9834,"y":-6507.6855},{"x":-5929.4414,"y":-9778.573},{"x":579.1953,"y":-4847.679},{"x":-1481.9834,"y":-6507.6855}],"interiors":[]}] ``` ![image](https://github.com/georust/geo/assets/26892280/58b4cbca-c2d4-4fae-9e8e-c97ebd1482c8) ![image](https://github.com/georust/geo/assets/26892280/541f32ee-52b7-44fb-9b12-cdf009fd99d3) # Union panics `internal error: entered unreachable code` ```json [{"exterior":[{"x":-5693.004,"y":2876.4512},{"x":-1425.9932,"y":-4538.469},{"x":7075.2715,"y":3801.1094},{"x":-5693.004,"y":2876.4512}],"interiors":[]},{"exterior":[{"x":-5696.2515,"y":5549.1904},{"x":4502.6426,"y":7662.08},{"x":-5677.376,"y":-9651.773},{"x":-5696.2515,"y":5549.1904}],"interiors":[]}] ``` ![image](https://github.com/georust/geo/assets/26892280/9a5e913c-24a0-4cc8-822a-4baa205c0cb9) # Intersection panics `internal error: entered unreachable code` while the second triangle can't be visualized again ```json [{"exterior":[{"x":9305.908,"y":-6702.5806},{"x":-4249.9185,"y":-5210.8096},{"x":-4326.806,"y":-5202.899},{"x":9305.908,"y":-6702.5806}],"interiors":[]},{"exterior":[{"x":5041.3135,"y":179.65332},{"x":8421.762,"y":8643.844},{"x":9279.895,"y":-8269.728},{"x":5041.3135,"y":179.65332}],"interiors":[]}] ``` ![image](https://github.com/georust/geo/assets/26892280/b30784dd-9e66-4260-914e-2cd3c4448592)
rmanoka commented 1 year ago

Thanks for the analysis! I authored the sweep, and indeed there are unresolved floating point issues in it. The crux is that: when two lines (p1, q1) and (p2, q2) intersect at an interior point p, due to the limited precision, (p1, p, q1), and (p2, p, q2) are not necessarily collinear. In particular, the new segment (p1, p, q1) might spuriously intersect another segment, due to this rounding error. Unfortunately, I think the current implementation is not well suited to handle this adjustment well.

I think using algorithms related to monotonic chains is the way to address this properly; we've some partial progress in another PR on breaking polygon/multi-polygon into monotonic polygons, but yet to work on the bool. ops on it.

RobWalt commented 1 year ago

@rmanoka sounds great!

Btw. what do you think about making the algorithm falliable? It's a real bummer for my app that it'll always crash if the algo didn't succeed? This robs me the opportunity to

a) just ignore the result if its not that important b) or slightly nudge some points in the poly an tri again

We could still provide a crashing version and implement the approach above via new methods in the form of try_.... This would also help to not introduce breaking changes and it is in line with the general falliable-naming-scheme of the rust ecosystem.

*edit:

I remember I already tried to do this a year ago or so, but I got stuck at some point where I couldn't change the return type directly to a Result. It was at a point where the Iterator trait requires the return type to be of shape Option<Item>

That being said, I know how to resolve this challenge nowadays and there actually come two ideas to mind. Let me know which one would be better in your opinion:

In either way, we can just use this throughout the whole algo code and unwrap() at the end to get the same result in the infalliable boolop code.

ggunde-deca commented 1 year ago

@RobWalt @rmanoka In that case, do we have a plan to use f64 instead of f32 to handle the precision problem?

RobWalt commented 1 year ago

@ggunde-deca

In that case, do we have a plan to use f64 instead of f32 to handle the precision problem?

Not exactly. Although the operations work for triangles with f64, it unfortunately doesn't really generalize to arbitrary geometric shapes (polygons, multipolygons). The thinks rmanoka wrote still apply to a general case with f64 scalars, although panics might not happen as often as in the f32 case. Still, in my opinion, panics happen too often even for f64.

If you're looking for a solution for you app, feel free to track my branch (#1032) which implements falliable boolops via methods like try_intersection(...). This way you would be able to catch panics, then slightly offset vertices which cause trouble if it fails and then try again. This is at least a feasible solution for my use case.

mjpauly commented 1 year ago

Might not be a good general-purpose solution, but I've had success with fractionally truncating the polygons before the difference. My use case is taking a difference when the left (subject) polygon and the right (clip) polygon share a significant and identical (or nearly-identical) border, and full 64-bit floating point precision is unnecessary.

The thinking here is that if there's trouble with the calculation due to imperfect precision, then we might as well ensure that all the points are exactly representable and sufficiently distinct from all other points. This is done by scaling up by some resolution factor and rounding all fractional values to integers. I believe any line segment created by points with integer coordinates won't easily fall near (but not on) an integer coordinate if you assume the line isn't too long.

You probably also need to ensure the geometry doesn't contain features which are smaller than the fractional resolution, otherwise the geometry may become invalid.

/// Safely(?) difference two MultiPolygons by scaling them up by some resolution,
/// and rounding all fractional values to integers, which are precisely
/// representable (within the range −2^53 to 2^53 for f64). The difference
/// operation is then well-behaved and doesn't encounter arthmetic errors, and
/// we can scale the result back down to the original size.
fn maybe_safe_difference(left: &MultiPolygon, right: &MultiPolygon) -> MultiPolygon {
    let resolution = 4096.0;
    let scale_up = |point: Coord| {
        coord! {
            x: (point.x * resolution).round(),
            y: (point.y * resolution).round(),
        }
    };
    let scale_down = |point: Coord| {
        coord! {
            x: point.x / resolution,
            y: point.y / resolution,
        }
    };
    let left = left.map_coords(scale_up);
    let right = right.map_coords(scale_up);
    let mut diff = left.difference(&right);
    diff.map_coords_in_place(scale_down);
    diff
}

I dug into the test cases in @cassiepearson-deca's repo a bit. I think the first case, with polygons in boolean_difference_subject_multipolygon.json is rather simple: a valid MultiPolygon shouldn't have overlapping polygons inside it, which this subject feature has. (btw those polygons looks pretty neat!) The second test case in boolean_difference_polygons.json with just two polygons doesn't produce a panic in my testing (not on 0.23.1 or 0.26.0).

Update: This method does not always work. I've encountered panics when the right (clip) polygon has a line segment which exactly crosses (or should exactly cross) a point on the interior of the left (subject) multipolygon.

diff_integer_fail copy

mjpauly commented 1 year ago

I've found these failure cases also hang indefinitely without panicking when compiling with optimizations. So using catch_unwind can't save you in every case. Here's a minimum repro for geo 0.26.0. The geometry is the same as the image in the last comment, but with additional x and y shifts.

//! Usually fails on the second test invocation, or on the second loop iteration.
//! Panics if running without optimizations (`cargo run`) and hangs
//! indefinitely if running with optimizations (`cargo run --release`).

use geo::BooleanOps;
use geo::MapCoordsInPlace;
use geo::{Coord, LineString, MultiPolygon, Polygon};

fn main() {
    let geo1 = Polygon::new(
        LineString(vec![
            Coord { x: -1.0, y: 46.0 },
            Coord { x: 8.0, y: 46.0 },
            Coord { x: 8.0, y: 39.0 },
            Coord { x: -1.0, y: 39.0 },
            Coord { x: -1.0, y: 46.0 },
        ]),
        vec![LineString(vec![
            Coord { x: 2.0, y: 45.0 },
            Coord { x: 7.0, y: 45.0 },
            Coord { x: 7.0, y: 44.0 },
            Coord { x: 5.0, y: 42.0 },
            Coord { x: 5.0, y: 41.0 },
            Coord { x: 0.0, y: 43.0 },
            Coord { x: 2.0, y: 45.0 },
        ])],
    );
    let geo2 = Polygon::new(
        LineString(vec![
            Coord { x: 0.0, y: 42.0 },
            Coord { x: 6.0, y: 44.0 },
            Coord { x: 4.0, y: 40.0 },
            Coord { x: 0.0, y: 42.0 },
        ]),
        vec![],
    );
    let mut left = MultiPolygon::new(vec![geo1]);
    let mut right = MultiPolygon::new(vec![geo2]);
    let shift = |c: Coord| Coord {
        x: c.x + 931230.,
        y: c.y + 412600.,
    };
    left.map_coords_in_place(shift);
    right.map_coords_in_place(shift);
    for i in 0..10 {
        println!("{} ", i);
        left.difference(&right);
    }
}

Turns out Clipper (Rust API in geo-clipper) implements all boolean ops with integers, like in my attempted solution, but has many more considerations for robustness. Worth considering if you're stuck on this.

RobWalt commented 11 months ago

FYI y'all there is a new "safe" (AFAIK) solution proposal in #1073 It's not perfect but it's something workable.

Edit: Nevermind, hit some edge cases again although the ones listed here worked well :(

RobWalt commented 10 months ago

There's yet another proposal implementation which looks promising here https://github.com/georust/geo/pull/1089

It's a complete new approach and uses Results instead of panicing. It also didn't really hit any of the Err cases yet but there's always this safety net.