Closed mjwillson closed 2 weeks ago
Indeed, that looks like a faulty assumption that integer overflow wraps. One obvious fix would be to use unsigned arithmetic here, though that requires some changes that propagate across a few functions. I'll see if I can get a fix done in the next few days.
Out of curiosity, what compiler and hardware are you using?
Thanks! Hardware nothing unusual (amd64), the compiler is clang, but in google's build environment which is passing in a lot of flags by default and from my experience tends to be quite cautious around undefined behaviour (i.e. err on the side of crashing).
It seems we're about to open up Pandora's box. These UB issues are not confined just to signed integer overflow in the reversible zfp algorithm, but the same can evidently happen in the far more important non-reversible algorithm. For example, a 1D float block { 7.f, 7.f, 7.f, 7.f }
encoded with maxprec = 2
results in decoded input x = 230, y = z = w = 0 to inv_lift()
:
https://github.com/LLNL/zfp/blob/5c976d8da013988174f931845862b6f94119cade/src/template/decode.c#L8-L34
On line 26, x <<= 1
results in x = 231 (regardless of how we implement this doubling of x; see below), which overflows this 32-bit signed integer. Of course, this is temporary since x -= z
would restore x to 230, but by then it's too late.
This is a larger challenge as the non-reversible algorithm relies on left and right shifts of signed quantities, with the assumption that right shifts are arithmetic (which is implementation-defined rather than undefined behavior). Worse yet, there are left shifts by one (to invert right shifts by one), and those are UB on signed integers (who knew?!). We can address those by replacing x <<= 1
with x *= 2
or x += x
, but we still need to avoid integer overflow.
One potential solution might be to consolidate x += x; x -= z;
as x += x - z
, though additional work would be needed to determine whether that is safe. Clearly, if neither z += x
nor x -= z
overflows, then this solution ought to be valid since x - z
is simply -z
before z += x
was executed. I believe we should never have z = -231 as input, which (using two's complement) would overflow when negated, but if that were a concern, then x -= z - x
might do the trick. One drawback of this approach is that we have now left the realm of lifting, but so be it.
We could also implement this with additional variables:
Int x2 = x1 - z1; Int z2 = z1 + x1;
Or using a temporary:
Int t = x; x -= z; z += t;
If the above still does not prevent overflow, then the only solution I can think of is to perform all arithmetic using unsigned integers, but that introduces complications with implementing arithmetic right shifts. Instead of x >>= 1
, we'd have to use some contrived code like x = x & 0x80000000u ? ~(~x >> 1) : x >> 1
, or perhaps x = (x & 0x80000000u) | (x >> 1)
. The cleanest would be to use sal()
and sar()
macros or functions for arithmetic shifting, but I am concerned about how this would impact performance and code readability. Of course, that is to be preferred over UB, though in practice we've gone 10 years without anyone noticing this in C and C++ implementations (and more recently in CUDA, HIP, and SYCL) using many different compilers and platforms. I guess it's hard to find an architecture that does not use two's complement and arithmetic right shifts with signed integers these days.
Let me think about how to best address this. It unfortunately seems to be a larger issue than what can be addressed in a couple of days.
Thank you and sorry about the can of worms!
For now we are able to work around this by compiling with -fwrapv -fno-sanitize=signed-integer-overflow
, I believe these are supported by clang and gcc but have only tried with clang.
-fwrapv
is "Treat signed integer overflow as two’s complement":
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fwrapv
https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#:~:text=fwrapv
Ideally I think it would be better to implement in a portable way, but if these flags are needed then good to document this and add to your build config here, ideally only for a minimal required set of files.
I believe this should be fixed in zfp, as valid input data should never invoke UB. This is somewhat different from #242, where the user violates zfp's assumptions on all-numerical inputs. I'm, however, unsure about the best path forward. My sense is that we need to do some work to prove that any one of the suggested changes above is guaranteed to avoid UB, but it will take some time. Fortunately we have very smart people at LLNL working on exactly such correctness problems, and I'm hoping we can solicit their help.
@mjwillson: #241 is being addressed on the bugfix/integer-overflow
branch. Please check if the fix works for you.
We're currently struggling with tests passing in general because of outdated scripts and some LLNL internal issues with GitLab. We hope to resolve these issues in the coming weeks.
@mjwillson I would like to merge this fix into develop
, but I would prefer to hear from you first to make sure it addresses all of your concerns.
I have merged this fix and am closing the issue. Please reopen if it does not address your needs.
Hello,
This bug seems to consistently cause SIGILL crashes for me when compressing float data using reversible encoding. When running under UBSAN (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html ) the issue is revealed to be a signed integer overflow:
Is this a bug, or were you relying on wraparound overflow happening silently here? Unfortunately it's undefined behaviour and at least some compilers are going to generate crashing code.