Open daef opened 4 years ago
Thanks for the detailed issue description!
Without having started any debugging, this is most likely exactly the issue I'm working on currently. The background is: Internally the line segments of the sweep line are stored in a tree. When neighbouring line segments intersect or overlap, they get subdivided. This mutates the segments that are in the tree, and because of numerical instabilities the order can change. This is something that trees don't like at all ;). Think of a tree storing the values 1 < 2 < 3 and the value 1 gets mutated to 2.5 < 2 < 3. Now searching the tree for 2.5 fails.
A solution that doesn't degrade performance is a bit tricky, but I have an idea. It will just take a little bit, because it probably requires to move away from the splay tree to a customized data structure I'm currently working on (a kind of a fixed-2-level B tree that allows to shuffle sort in these situations).
Would it be possible to consistently break ties like that (same starting point, same direction) by another measurement (e.g. squared length, x component, ...)?
Yes, such a mechanism is part of the solution. The problem is that the situation can change from before and after computing overlaps/intersections:
While using a release-build and ignoring the missed events works out often, I also get results like this:
the union of yellow and green: results in the orange and the red polygon (dashed being an 'interior') overlayed it looks even worse
but trying to (re)create this with a fixture it magically works:
edit: I finally found a difference: I use f32, the tests use f64. maybe it would be nice to be able to explicitly create f32-fixtures or test all fixtures with f32 and f64...
edit: I finally found a difference: I use f32, the tests use f64. maybe it would be nice to be able to explicitly create f32-fixtures or test all fixtures with f32 and f64...
Ah good that you could make sense of it.
I'm still undecided if it is a good idea to make the library generic in the underlying numeric type. If you look at the (most robust?) polygon clipping library Clipper, they have "solved" the problem by allowing only integers in a fixed range -- not even float support. Since I'm not aware of a library that is generic and achieves a decent level of robustness I initially thought the problem may be too tough for generics and restricting to a single numeric type may be necessary. Currently I'm a bit more optimistic: I think I understand the problem in the issue above and I have an idea how to solve it. If that's the case, the question whether the test cases are in f32 or f64 shouldn't matter much, because I should have equivalent f64 test cases.
I thought the bigger numerical instabilities arising from smaller floats call for more robust code in the library - therefore it should be beneficial.
eg dont calculate intersectionpoints - but reference the lines which intersection is meant to be the point + geometric predicates, or maybe one could use symbolic perturbations a'la Magalh ̃aes in 10.1145/3139958.3140001 to decide. Or maybe throw memory at the problem and remember the originally calculated angle with the subdivided segment (so there's no need to recalculate a new - slightly off - angle)
You made me curious - how do you plan to solve this?
I thought the bigger numerical instabilities arising from smaller floats call for more robust code in the library - therefore it should be beneficial.
Yes true, the issues will show up earlier, but they still fall into the same categories. Currently it simplifies the test code quite a bit by not having to write everything generically there as well.
eg dont calculate intersectionpoints - but reference the lines which intersection is meant to be the point + geometric predicates, or maybe one could use symbolic perturbations a'la Magalh ̃aes in 10.1145/3139958.3140001 to decide. Or maybe throw memory at the problem and remember the originally calculated angle with the subdivided segment (so there's no need to recalculate a new - slightly off - angle)
I already went down that rabbit hole of "higher precision intersections" ;). As a baseline I was using an arbitrary precision library: This makes intersection computation roughly a factor 100 slower, and since there aren't any pure Rust libraries, it is maybe not ideal. The best solution I came up with is this. This basically takes Shewchuk's idea behind his "adaptive" non-overlapping representations (used in "robust predicates" implementations), and limits them to just the second order, i.e., whereas the original adaptive representations can take much higher orders (=> slow like the full arbitrary precision library), the implementation I went for always reduces the results back to second order (=> stays relatively fast). In my experiments having two f64
to represent the intersection was always sufficient to get exactly the same result as arbitrary precision, but it is probably possible to find edge cases where also that isn't sufficient any more. Performance wise it was pretty okay: only a factor ~10 slower than the pure f64
algorithm. If it turns out higher precision for intersection is needed, I'd revive this idea.
You made me curious - how do you plan to solve this?
The "sweep line misses event to be removed" issue only has to do with the fact that the elements in the tree get mutated in a way that messes up the ordering. The obvious fix is to "re-sort" the elements if a mutation happened. Note that not a full sort is needed -- the fluctuations from mutations are tiny, so typically we'd only have to swap like two elements to repair the order again. As far as I can see, this is just a bit costly/ugly to do with trees though, because next/prev operations are O(log N), and bubble sorting them is a bit of a pain. I have created this special data structure for that purpose: It has O(1) next/prev and should be fairly straightforward to bubble sort, while still scaling reasonably well to the sizes of common sweep lines.
Are there any updates on this? We're currently in the final strides of finishing a release of our software, but we're blocked by this crash.
Is there anything we could do to help fixing this bug?
Yeah, good question. Basically this issue is a just a manifestation of a fundamental issue of the approach, and resolving it probably requires quite some effort. My plan was:
rust-array-stump
.rust-array-stump
to support "rank fixing", e.g., bubble sort after elements in the set have been modified in a way that violates the ordering. From here on things are still todo.compare_segments
to properly account for the rank changes. (I think I had started with this in this branch initially, but this is outdated now and only lead to the realization that it can't be solved without fixing the tree mutation issue first).Unfortunately, the steps are all tangled up to some degree, e.g. the breaking test blocking 2 probably needs the adjustments to compare_segments
. The combined fix would probably turn into a fairly big rewrite, requiring some days of really deep debugging. Since my project depending on rust-geo-booleanop
is low priority at the moment, I couldn't really motivate myself for such a deep dive ;).
Is there some kind of quick fix that can be applied until the real fix is implemented?
If you are in a debug build and just want to get rid of the panic, you can comment out the debug_assert
. In a release build there should be no crash, but most likely the algorithm result is broken.
I've implemented rank fixing for the new data structure and the respective calls there in subdivide_segments after possible mutations of the events. This all happens in a pretty naive(=unconditional) way, so I expect the performance to be suboptimal at the moment. I guess that the call could at least be skipped if possible_intersection returns 0 (What would be a better type to return for possible_intersection instead of u8? Maybe an enum?)
It could possibly be even faster to entangle the code even more and put that rank_fixing burden into possible_intersection somehow.
Right now I'm pretty happy that I've not broken any test-cases.
Maybe @anlumo could try to integrate this into his release last minute. jk scnr
On a more serious note I'd love to know how to benchmark the performance. I've just stumbled upon cargo bench
and am playing with it a bit, but I get many warnings (too few samples in given 3secs) and I have no idea what I'm really doing :)
Edit: posted prematurely due to accidental CtrlReturn
On a more serious note I'd love to know how to benchmark the performance. I've just stumbled upon cargo bench and am playing with it a bit, but I get many warnings (too few samples in given 3secs) and I have no idea what I'm really doing :)
You can first checkout the master branch and run
cargo bench --bench benchmark -- --save-baseline master
there. Then switch to the feature branch and run
cargo bench --bench benchmark -- --baseline master
Then you should see the relative impact on each benchmark test case. Typically you can ignore the benchmark outlier warnings. Criterion (the benchmark tool) is a bit verbose in that regard. When in doubt, re-run the benchmarks (on my machine I'm actually observing quite some fluctuations from run to run and stopping background processes can help).
Maybe @anlumo could try to integrate this into his release last minute. jk scnr
We've already fixed the issue by switching to geo-clipper, which is pretty much a drop-in replacement with the advantage of not crashing.
So, we're not in a hurry for this fix any more, although I'd prefer a pure Rust implementation if it's equivalent.
OT: I should probably include (geo-)clipper in my comparison. How does it compare performance wise?
It feels faster than geo-booleanop, although that might be related to the C++-library being built with optimizations even for Rust debug builds. I don't have any objective numbers, though.
Has there been any progress on this issue in the last 3 months?
Not from my side (my use case is on hold).
my contribution should be pretty ready to merge, at least it works for me :)
I'm using the bluenotes arraystump instead of the splaytree successfully, with better performance and less bugs, but still not completely free of any 'sweepline misses event's, but didn't care to debug further as it's good 'nuf for the moment (for me)
edit: clarified what I'm talking about
I tried to debug this issue. It seems like this line of fn compare_segments
is always returning false if the contour_ids are also the same (that is the compared events are extremely identical).
In this case, this ends up confusing the splay tree (or even a BST) because the Ord
guarantee is no longer true. For such two events a, b: both compare_segments(a, b)
and compare_segments(b, a)
would return Greater
. One possible fix is to, as a last resort, compare using ptr addresses. I tried adding this, and the panic goes away (and all existing tests also pass). Does this fix the issue? Please see attached PR
Does this fix the issue?
I don't think so, because the order in the tree is fundamentally messed up, i.e., the tree no longer maintains its invariant (a <= b
in agreement with the tree structure). What you suggest may indeed fix a particular test case, because it doesn't panic immediately, but as far as I can see, it shifts the trouble to a later point: Subsequent operations on the tree assume that the invariant holds, but it is violated. So any follow-up search / insert / splay operation could fail. In my opinion the "rank fixing" logic as proposed above is needed to properly maintain the invariant of the data structure. And since this is very costly to do with a splay tree I proposed to use these "array stumps" which can be re-sorted much more efficiently.
@bluenote10 In another local branch, I tried to fix the splay tree mutation issue. The mutation relevant to tree-ordering seems to happen only in divide_segment
by this call to set_other_event
, and I fixed it by passing a mut-ref of the splay tree itself to this function, and by removing the event from the tree, mutating, and then re-inserting it into the tree. That should solve the key mutation issue (as it is not being mutate inside the tree), and the complexity still remains O(n log n).
However, the above mentioned bug still remained, and it was still causing the panic on the test case mentioned here. I think the fix to compare_segments
is important, as otherwise the comparison itself is flawed. I didn't add the tree-mutation fix in the PR, because it seemed unnecessary. Do you have other test cases that panic / fail? I could try testing locally and see if it helps.
fwiw, this branch fixes the mutations as explained above. It includes the fix to comparator too; removing that brings back the panic. Yet to run any comprehensive benchmarks, but I'm also not a big fan of the SplayTree; does it perform poorly compared to say vanilla BTreeMap ?
Do you have other test cases that panic / fail?
I vaguely remember that I had a few more test cases prepared related to some open issues, and I never could get them all to work. Unfortunately, quite some time has passed, and I cannot remember the details anymore. I still have several test cases lying around as untracked files in my working dir -- some of them were uncommitted because I couldn't get them to work. Here they are if it helps: uncommitted_test_cases.zip
I fixed it by passing a mut-ref of the splay tree itself to this function, and by removing the event from the tree, mutating, and then re-inserting it into the tree. That should solve the key mutation issue (as it is not being mutate inside the tree), and the complexity still remains O(n log n).
Yes, I considered that as well, but benchmarks didn't look good, because it's quite an overhead to remove+insert on every subdivision. But perhaps for now that is the best option. Good performance isn't worth much if the algorithm isn't correct.
However I was also never 100% sure if this actually gives full correctness, because if divide_segment
does a remove+insert the neighbor relationships can change during the course of an iteration, i.e., the role of index_prev
and index_next
in main subdivide
function can change as a side effect of the possible_intersection
call. That always seemed dangerous. What I had in mind was to have a "rank fixing" functionality in the array stump function that you can give some additional references/indices and if the rank fixing needs to mess with the order of elements, the passed in references/indices get modifying accordingly so that they maintain their neighbor semantics. But again, I'm largely guessing at the moment because of vague memory.
Yet to run any comprehensive benchmarks, but I'm also not a big fan of the SplayTree; does it perform poorly compared to say vanilla BTreeMap ?
Yeah, unfortunately the SplayTree is relatively slow, see my benchmarks here (expect for the "find recent" case which they are optimized for, but that doesn't seem to be very relevant in the Martinez context).
I added all the tests the zip file. They all seem to pass now (had to tweak the equality check in tests to allow for permuted but equivalent components of multi-polygon, polygon-interiors, and allow shifts in polygon rings).
Interestingly, this is still without the mutation fix. I suppose with arbitrary precision, divide_segments
does indeed not mutate the ordering of the tree, but I'm not sure why it works with f64
. It is possible that we haven't found the right counter-example for the mutation issue; will try playing around with f64::nextafter
and see.
I suggest we merge the #28 PR as it fixes an independent bug.
to me it seemed that using f32 instead of f64 brought up various issues that f64 also had - but much earlier.
When aggregating some simple (convex, <6 vertices) polygons via repeated unions I get this panic:
I managed to create a .geojson fixture that provokes the panic:
The black polygon is my aggregate, the orange one the one I want to union. Nudging the marked vertex unpanics the situation.
EDIT: added image, added fixture, more informations