Closed dyollb closed 5 months ago
Wow, thanks for catching this! Gotta love those sanitizers. Your guess about the cause makes perfect sense to me. There are several other places in that function that use the same pattern too.
I'm a little stumped as to how to fix this. Is it possible that the e
and f
arrays actually need to be one element longer than what they were allocated with? If that's the case, fixing this bug would involve possibly changing a huge number of sites within the code.
If that code is loading values from memory that it shouldn't be touching, then I'd imagine there would be serious correctness bugs depending on how the stack frame is arranged. If I had to guess, I'd say that by that point in the loop, the code will follow a path such that the bad value is never actually used. Does that seem right to you? If that's true, then the bug could be fixed by changing the control flow.
i am wondering if
a) is the indexing shifted by one? the first element of e
is never used? I guess this would be a bug in the algorithm and incorrect results could occur.
b) should the prefix increment be a postfix increment (related to a)?
c) the array should be longer, or the length (elen
) reduced by 1
?
d) should the while loop (and check eindex+1 < elen
)?
I think a) is unlikely, since so many people have used this code for such a long time because results seem reliable (and I guess comparable to arbitrary precision arithmetic libraries). I guess the bad value is rarely used, and therefore has not been identified as a problem.
One way to debug/catch the situation could be to change only fast_expansion_sum_zeroelim
to check if eindex==elen
.
I think your option (d) would be quick to test. Do you want to give it a try? If it works then great, if not then we can explore other options.
@dyollb could you share the particular input values that trigger this error? I'd like to try reproducing the bug.
@danshapero @dyollb I reached out to Jonathan Shewchuk about this issue and he responded with the following message:
Hi Hans,
The overflow is deliberate. The only way I could see to remove it adds a conditional inside the inner loop, making the loop run much more slowly, and I decided that was an unacceptable trade-off.
In the event where the code reads one word past the end of the array, the code does not actually use the spurious nonsense word. The solution is simply to make sure that that word of memory is readable so that there is no system trap.
@dyollb could you share the particular input values that trigger this error? I'd like to try reproducing the bug.
Hi Hans
You can have a look at my fork (and travis CI) to see how i built with sanitizers: https://github.com/dyollb/predicates/blob/5bfb99cb98dfaad4ea938de0f0aeb5c613194632/.travis.yml#L18
Good luck
@dyollb thanks! Although I probably won't bother with it now that I know the overflow is "working as intended".
Thanks for looking into this @HansBrende ! I'll close the issue now that we know this. I'm sure that at the time it made sense to avoid conditionals if there's an unacceptable performance cost, but I wonder if this is a non-issue today because of branch prediction.
For what it's worth, I made another library here that does sign-exact geometric predicates in a way that I actually understand -- use interval arithmetic to compute the determinants and fall back to bignum rationals if the result interval contains 0. It probably isn't the fastest but I at least understand how it works, template hackery aside.
I found this issue in the original predicates code, and can reproduce it also in your code (or in my fork of your repo):
https://travis-ci.com/github/dyollb/predicates/jobs/353312903
1: ==5947==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc91634a20 at pc 0x7f3a82a4e021 bp 0x7ffc91634540 sp 0x7ffc91634538 1: READ of size 8 at 0x7ffc91634a20 thread T0 1: #0 0x7f3a82a4e020 in fast_expansion_sum_zeroelim /home/travis/build/dyollb/predicates/src/predicates.c:705:16 1: #1 0x7f3a82a5db77 in orient2dadapt /home/travis/build/dyollb/predicates/src/predicates.c:1210:14 1: #2 0x7f3a82a5f560 in orient2d /home/travis/build/dyollb/predicates/src/predicates.c:1257:10 1: #3 0x771146 in C_A_T_C_HT_E_S_T____0()::$_23::operator()(double const*) const /home/travis/build/dyollb/predicates/test/orient2d.cpp:37:52
I am wondering how to fix this (if at all). Not sure if this is a false positive, but I guess it is coming from following lines (702-714):
while ((eindex < elen)... ){ ... enow = e[++eindex]; ... }
The prefix increment could lead to access at e[elen], which would explain the 'stack-buffer-overflow' in incircle and orient2d.