georust / geo

Geospatial primitives and algorithms for Rust
https://crates.io/crates/geo
Other
1.49k stars 194 forks source link

Add `PreparedGeometry` to speed up repeated `Relate` operations. #1197

Closed michaelkirk closed 3 days ago

michaelkirk commented 3 weeks ago

Fixes #803

Much of the cost of the Relate operation comes from finding all intersections between all segments of the two geometries.

To speed this up, the edges of each geometry are put into an R-Tree.

We also have to "self node" each geometry, meaning we need to split any segments at the point that they'd intersect with themselves.

None of that work is new to this commit, but what is new is that we now cache the self-noding work and the geometry's r-tree.

relate prepared polygons
                        time:   [49.036 ms 49.199 ms 49.373 ms]
relate unprepared polygons
                        time:   [842.91 ms 844.32 ms 845.82 ms]
gauteh commented 3 weeks ago

This looks very promising. I'm trying this out in: https://github.com/gauteh/roaring-landmask/blob/georust-prepared/src/shapes.rs#L22 , but I'm not able to share the PreparedGeometry'ies between threads (through pyclass). I've also put the Polygons of the MultiPolygon in an RTree (not sure if that is necessary anymore..?).

urschrei commented 3 weeks ago

I've also put the Polygons of the MultiPolygon in an RTree (not sure if that is necessary anymore..?).

I don't think you need to do this…

gauteh commented 3 weeks ago

I've tried to change all the spots with RefCell (which can't just be put in an Arc). Eventually that will require a mut GeometryGraph which will prevent parallelism anyway. So far I've only found swap_labels that requires a mutable GeometryGraphy. Maybe the best solution is add another Label (https://github.com/georust/geo/pull/1197/files#diff-b64c206f2b566eef304be445dd47e93d22b59dd11b986e715b8152a367e862b1L22) for the other direction? Or some method on labels that swap them/reverse them only for the operation, but not changing the object. I assume the latter is not so good since they are already cached for a reason.

michaelkirk commented 3 weeks ago

PreparedGeometry is not Send because GeometryGraph is not send. It seems like something worth addressing, but arguably out of scope for this PR.

michaelkirk commented 3 weeks ago

I've tried to change all the spots with RefCell (which can't just be put in an Arc). Eventually that will require a mut GeometryGraph which will prevent parallelism anyway. So far I've only found swap_labels that requires a mutable GeometryGraphy. Maybe the best solution is add another Label (https://github.com/georust/geo/pull/1197/files#diff-b64c206f2b566eef304be445dd47e93d22b59dd11b986e715b8152a367e862b1L22) for the other direction? Or some method on labels that swap them/reverse them only for the operation, but not changing the object. I assume the latter is not so good since they are already cached for a reason.

Thanks for taking a look at this.

swap_labels only happens during construction - so there shouldn't be an issue here with mutating anything since no other thread could have a reference to the mutable thing we've just constructed. Right?

    pub fn clone_for_arg_index(&self, from_arg_index: usize, to_arg_index: usize) -> Self {
        let mut graph = Self {
            nodes: self.nodes.clone(),
            // deep copy edges
            edges: self
                .edges
                .iter()
                .map(|e| Rc::new(RefCell::new(e.borrow().clone())))
                .collect(),
        };
        assert_eq!(from_arg_index, 0);
        if from_arg_index != to_arg_index {
            graph.swap_labels(); // <-- mutates the graph, but nothing else could have a handle on `graph` yet.
        }
        graph
    }
gauteh commented 3 weeks ago

I am totally unfamiliar with the code, however: From the docs it seems that swap_labels is done depending on whether you do a.relate(b) or b.relate(a), which is not known in advance?

gauteh commented 3 weeks ago

On the other hand, it is weird that this works before this PR? Maybe it is not related.

michaelkirk commented 3 weeks ago

From the docs it seems that swap_labels is done depending on whether you do a.relate(b) or b.relate(a), which is not known in advance?

Correct, but swap_labels is called on the cloned GeometryGraph owned by the context that calls swap_labels, so that should be fine. I think the issue is only that GeometryGraph has some internal Rc. This was not relevant before, because GeometryGraph was only created as a transient entity.

The whole point of PreparedGeometry is to keep that entity around for re-use. With this PR it works in a single threaded context. I'd propose that we leave that as the bar for now and follow up with "share a prepared geometry across threads" as future work since this PR is already pretty large.

Plus, that way we can have a focused study of the performance tradeoffs needed (e.g. Rc vs Arc).

gauteh commented 3 weeks ago

The whole point of PreparedGeometry is to keep that entity around for re-use. With this PR it works in a single threaded context. I'd propose that we leave that as the bar for now and follow up with "share a prepared geometry across threads" as future work since this PR is already pretty large.

Plus, that way we can have a focused study of the performance tradeoffs needed (e.g. Rc vs Arc).

Absolutely! I've been chipping away at moving to just &mut, don't think we need the Arc. There are a couple of cases where it's not possible to exactly replicate the behavior, but I suspect those cases may be not totally correct (or at least redundant). I'll put it in a separate PR for reference (whether it is usable or not).