Closed Quuxplusone closed 11 years ago
Bugzilla Link | PR16642 |
Status | RESOLVED INVALID |
Importance | P normal |
Reported by | Juliusz Sompolski (julsomp@gmail.com) |
Reported on | 2013-07-17 07:35:14 -0700 |
Last modified on | 2013-07-19 11:14:24 -0700 |
Version | 3.3 |
Hardware | PC Linux |
CC | baldrick@free.fr, dblaikie@gmail.com, llvm-bugs@lists.llvm.org, sepavloff@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Overflow when multiplying two signed numbers results in undefined behaviour
according to the C standard. Thus the compiler is free to introduce code that
fires a nuclear missile, wipes your hard-drive etc, if the multiplication does
in fact overflow. Fortunately the compiler only inserted code that sets err to
zero in this case, phew!
As overflow when multiplying unsigned numbers does not result in undefined
behaviour you can fix your code by replacing
r = x * y;
with
r = (long long)((unsigned long long)x * (unsigned long long)y);
Thanks.
-fwrapv
also helps.
Yes, -fwrapv turns off this kind of optimization. Unfortunately turning it off makes life much harder for the loop optimizers, so it can have a bad performance impact on some programs.
Undefined behavior refers only to the statement 'r = x * y', so 'r' may have
arbitrary value. However the rest of code must be deterministic. In particular,
the generated code must have check and corresponding branch. Both GCC and Intel
compiler generate the check and branch. Clang doesn't and it looks like an
error.
No, undefined behaviour does not mean that r may get a strange result, it means
*that anything can happen* when this expression is evaluated. Yes, anything.
Try reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know.html
(In reply to comment #6)
> No, undefined behaviour does not mean that r may get a strange result, it
> means *that anything can happen* when this expression is evaluated.
I'll see your technical correctness & raise you some, just for fun :)
Undefined behavior does not mean that anything can happen /when the expression
is evaluated/ - it means that any program execution that includes an execution
of that statement has no guarantees whatsoever. Behavior of the program even
before the execution of that statement is still completely unconstrained.
Yes, this is true, and I'm pretty sure that LLVM does this kind of back
propagation of undefined behaviour sometimes. Fun fact: compcert has been
proved *not* to do this (undefined behaviour having an impact before reaching
the problematic statement).
If a hardware trapped integer overflow, this optimization would be perfectly
correct, as the code following multiplication wouldn't execute. For x86 integer
overflow isn't trapped, the statement 'err = y != 0 && r / y != x;' is executed
and correctly checks if overflow takes place. I think you are right that
compiler
behaves right from legal viewpoint, but such interpretation is very user-
unfriendly.
The bad thing is that the code in the PR represents common pattern used to check
integer overflow. In ideal world where developers keep in mind 1500+ pages of
standard and cover all functionality with unit tests, that wouldn't be a
problem.
For the real world the compiler behaves very nasty - it removes check for
limiting
value, it indeed may result in lost spacecraft, missile fire and damage to my
cat.
I would highly recommend to reopen this PR. It represents an *important* real
life
case, in which compiler must do its best for a user and be on a safe side.
GCC also does this kind of optimization. For example gcc-4.8 transforms this
int f(int x) {
return (x + 1) < x;
}
into
int f(int x) {
return 0;
}
So why doesn't it transform your example? In my opinion it's just that the GCC
developers didn't implement it yet. Given that they already implemented many
other optimizations of this kind (and turned them on by default), I think it's
just a matter of time before they do this one too. The fact that you and
others got away with this wrong code with GCC up to now doesn't mean you will
get away with it with GCC in the future.
I'm afraid that your argument that because x86 acts in some way means that C
code that looks kind of the same should act the same way hasn't been true for a
long long time. Try reading http://blog.regehr.org/archives/213
As you found out, if you don't want this kind of optimization, you can turn it
off with -fwrapv.
Why isn't -fwrapv the default? I think it's not the default in clang simply
because it's not the default in GCC. Clang tries to be GCC compatible,
including in the general kinds of optimizations that get turned on by default.
So why isn't -fwrapv the default in GCC? I don't know, but I suspect it's
simply that GCC tries to give maximal performance by default, not maximal
safety. (If you want a safe language, you really shouldn't be using C).
If you are worried that your code might suffer from this and other kinds of
undefined behaviour, clang provides ways to check for this: compile with -
fsanitize=undefined and such errors will be caught at runtime (-fsanitize has
many other possible values, see
http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation). Some
of this sanitizer stuff has recently been ported from LLVM to GCC, so you may
be able to do this with recent versions of GCC too.
Thank you for the references. This is surprisingly complex topic and
information you shared helped substantially.
You convinced me. It is nice to have compile-time warnings but it is not about
code generation. The only point I am still worried is absence of this
particular optimization in most compilers. I compiled the original example with
several compilers:
GCC 4.7.2
ICC 13.1
MSVC (Visual Studio 2012)
Keil C 5.03.0.76
IAR 6.50.3.4757
None optimized out the check.
This case is a bit harder to optimize than (x + 1) < x, because when the
compiler sees the expression
r / y
it is easy for the compiler to see that
r = x * y
but it is not so easy to see that y != 0 because knowing that requires
analysing control flow to see that you could only get to the division if y !=
0. Let's suppose this is too hard, so the compiler doesn't know that y != 0
when it analyses the division r / y. What can it do?
Here is what LLVM does: according to the C standard, dividing by zero results
in undefined behaviour (not a trap), so it can assume that y != 0 and turn
(x*y)/y into x, because this transform is only wrong if x*y overflows or if
y==0 but in both cases it is undefined behaviour.
However traditionally most compilers have been less aggressive about division
by zero, and have tried to ensure it always generates a trap, even though they
aren't required to. In particular, they try not to remove instances of
division by zero. So what I suspect is that the reason these other compilers
aren't doing the transform is because they are being extra nice about division
by zero, and aren't smart enough to see that it isn't an issue here. And not
because they are unwilling to exploit the undefined behaviour of x*y
overflowing. I could be wrong of course, and it might be interesting to
discuss this with the GCC developers to see what their thinking is for this
case.
As an aside:
if we don't already diagnose this with Clang's -fsanitize=undefined you could file a bug so we do
if we don't already diagnose this with a warning at compile time (if there's something warnable (though, looking at the code, I'm not sure there's something sufficiently narrow/idiomatic for us to detect)) you could file a bug