Closed michaelkirk closed 3 days ago
This looks very promising. I'm trying this out in: https://github.com/gauteh/roaring-landmask/blob/georust-prepared/src/shapes.rs#L22 , but I'm not able to share the PreparedGeometry'ies between threads (through pyclass). I've also put the Polygons of the MultiPolygon in an RTree (not sure if that is necessary anymore..?).
I've also put the Polygons of the MultiPolygon in an RTree (not sure if that is necessary anymore..?).
I don't think you need to do this…
I've tried to change all the spots with RefCell (which can't just be put in an Arc). Eventually that will require a mut GeometryGraph which will prevent parallelism anyway. So far I've only found swap_labels that requires a mutable GeometryGraphy. Maybe the best solution is add another Label (https://github.com/georust/geo/pull/1197/files#diff-b64c206f2b566eef304be445dd47e93d22b59dd11b986e715b8152a367e862b1L22) for the other direction? Or some method on labels that swap them/reverse them only for the operation, but not changing the object. I assume the latter is not so good since they are already cached for a reason.
PreparedGeometry is not Send because GeometryGraph is not send. It seems like something worth addressing, but arguably out of scope for this PR.
I've tried to change all the spots with RefCell (which can't just be put in an Arc). Eventually that will require a mut GeometryGraph which will prevent parallelism anyway. So far I've only found swap_labels that requires a mutable GeometryGraphy. Maybe the best solution is add another Label (https://github.com/georust/geo/pull/1197/files#diff-b64c206f2b566eef304be445dd47e93d22b59dd11b986e715b8152a367e862b1L22) for the other direction? Or some method on labels that swap them/reverse them only for the operation, but not changing the object. I assume the latter is not so good since they are already cached for a reason.
Thanks for taking a look at this.
swap_labels only happens during construction - so there shouldn't be an issue here with mutating anything since no other thread could have a reference to the mutable thing we've just constructed. Right?
pub fn clone_for_arg_index(&self, from_arg_index: usize, to_arg_index: usize) -> Self {
let mut graph = Self {
nodes: self.nodes.clone(),
// deep copy edges
edges: self
.edges
.iter()
.map(|e| Rc::new(RefCell::new(e.borrow().clone())))
.collect(),
};
assert_eq!(from_arg_index, 0);
if from_arg_index != to_arg_index {
graph.swap_labels(); // <-- mutates the graph, but nothing else could have a handle on `graph` yet.
}
graph
}
I am totally unfamiliar with the code, however: From the docs it seems that swap_labels is done depending on whether you do a.relate(b) or b.relate(a), which is not known in advance?
On the other hand, it is weird that this works before this PR? Maybe it is not related.
From the docs it seems that swap_labels is done depending on whether you do a.relate(b) or b.relate(a), which is not known in advance?
Correct, but swap_labels
is called on the cloned GeometryGraph
owned by the context that calls swap_labels
, so that should be fine. I think the issue is only that GeometryGraph has some internal Rc
. This was not relevant before, because GeometryGraph was only created as a transient entity.
The whole point of PreparedGeometry is to keep that entity around for re-use. With this PR it works in a single threaded context. I'd propose that we leave that as the bar for now and follow up with "share a prepared geometry across threads" as future work since this PR is already pretty large.
Plus, that way we can have a focused study of the performance tradeoffs needed (e.g. Rc vs Arc).
The whole point of PreparedGeometry is to keep that entity around for re-use. With this PR it works in a single threaded context. I'd propose that we leave that as the bar for now and follow up with "share a prepared geometry across threads" as future work since this PR is already pretty large.
Plus, that way we can have a focused study of the performance tradeoffs needed (e.g. Rc vs Arc).
Absolutely! I've been chipping away at moving to just &mut, don't think we need the Arc. There are a couple of cases where it's not possible to exactly replicate the behavior, but I suspect those cases may be not totally correct (or at least redundant). I'll put it in a separate PR for reference (whether it is usable or not).
[x] I added an entry to
CHANGES.md
if knowledge of this change could be valuable to users.Fixes #803
Much of the cost of the
Relate
operation comes from finding all intersections between all segments of the two geometries.To speed this up, the edges of each geometry are put into an R-Tree.
We also have to "self node" each geometry, meaning we need to split any segments at the point that they'd intersect with themselves.
None of that work is new to this commit, but what is new is that we now cache the self-noding work and the geometry's r-tree.