Closed masinter closed 3 years ago
Does this work on SPARC? When I look at the SPARC code, it feels to me like like it is an error:
.inline _sub32,8
subcc %o0,%o1,%o0 ! result = arg0 - arg1
bvs diff_err
nop
.end
.inline _mpy32,8
ba mpy_err
nop
.end
The sub32
routine does the operation and then branches if the carry condition code is set. The mpy32
routine just does an unconditional branch though.
Which 64 bit platform were you on? macOS and Linux are LP64, so the int
result
should still have been a 32 bit value and resulted in going to doufn2
... or what am I missing so that I can understand this better?
Okay, so this is readily reproducible with a simple C example on macOS or Linux with "modern" clang or gcc. I am not at home, so I'm on macOS rather than Linux, so examples will be from macOS and clang.
Modern clang and gcc have optimizations that take advantage of overflow of signed integers being undefined behavior, so they allow themselves to eliminate code that assumes that undefined behavior happened. This has been pretty upsetting to many and surprising to even more over the last decade or more. This is a good paper from 2012 that covers this: https://www.cs.utah.edu/~regehr/papers/overflow12.pdf
If we take this bit of code and compile it with clang -O2 -m64 -std=gnu89 -funsigned-char -fno-strict-aliasing
:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
register int arg1, arg2;
register int result;
arg1 = atoi(argv[1]);
arg2 = atoi(argv[2]);
result = arg1 * arg2;
if ((arg2 != 0) && ((result / arg2) != arg1)) {
printf("Overflowed.\n");
} else {
printf("Good: %d\n", result);
}
return 0;
}
It won't work as expected. If we look at the generated assembly, we'll see that the whole check and print of "Overflowed."
has been removed by the compiler, because for it to have succeeded, it means that undefined behavior must have occurred:
movq %rsi, %rbx
movq 8(%rsi), %rdi
callq _atoi
movl %eax, %r14d
movq 16(%rbx), %rdi
callq _atoi
imull %r14d, %eax
leaq L_.str.1(%rip), %rdi
movl %eax, %esi
xorl %eax, %eax
callq _printf
xorl %eax, %eax
popq %rbx
popq %r14
popq %rbp
retq
Now, we can add -fwrapv
to the compiler command line, which instructs both gcc and clang to not perform that sort of optimization, and that fixes things:
movq %rsi, %rbx
movq 8(%rsi), %rdi
callq _atoi
movl %eax, %r14d
movq 16(%rbx), %rdi
callq _atoi
movl %eax, %esi
imull %r14d, %esi
testl %eax, %eax
je LBB0_3
## %bb.1:
movl %eax, %ecx
movl %esi, %eax
cltd
idivl %ecx
cmpl %r14d, %eax
jne LBB0_2
LBB0_3:
leaq L_.str.1(%rip), %rdi
xorl %eax, %eax
callq _printf
jmp LBB0_4
LBB0_2:
leaq L_str(%rip), %rdi
callq _puts
LBB0_4:
xorl %eax, %eax
popq %rbx
popq %r14
popq %rbp
retq
That's a trivial fix and it has been experimented with before as evidenced by some commented out compiler command lines in the repository.
However, with clang and gcc, there's another alternative as well, which looks like this in C:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
register int arg1, arg2;
int result;
arg1 = atoi(argv[1]);
arg2 = atoi(argv[2]);
if (__builtin_smul_overflow(arg1, arg2, &result)) {
printf("Overflowed.\n");
} else {
printf("Good: %d.\n", result);
}
return 0;
}
Here, we're using a __builtin_smul_overflow
primitive, which generates this code, without the extra idivl
. It is worth noting that we had to remove register
from the result variable since you can't take the address of a register variable. This doesn't matter though and doesn't actually change the generated code at all. (I tested in the original code sample with and without the register
modifier.)
movq %rsi, %rbx
movq 8(%rsi), %rdi
callq _atoi
movl %eax, %r14d
movq 16(%rbx), %rdi
callq _atoi
imull %r14d, %eax
jno LBB0_2
## %bb.1:
leaq L_str(%rip), %rdi
callq _puts
jmp LBB0_3
LBB0_2:
leaq L_.str.1(%rip), %rdi
movl %eax, %esi
xorl %eax, %eax
callq _printf
LBB0_3:
xorl %eax, %eax
popq %rbx
popq %r14
popq %rbp
retq
Just like before, it is performing an imull
instruction, but now, instead of doing the extra math, we're just doing a jo
or "jump when overflow flag is set".
This is the same thing that the SPARC assembly code was doing above (for addition), but here with gcc and clang, it is able to do this for each supported platform, without custom assembly code.
I think there's at least 4 things that we should do here:
-fwrapv
to the clang and gcc compile flags for now.#ifdef
for clang and gcc to use the overflow-checking builtins, and then remove -fwrapv
once that is in place.neither the Oracle Developer Studio compiler for SPARC nor the corresponding one for x86/x86_64
$ cc -V
cc: Studio 12.6 Sun C 5.15 SunOS_sparc 2017/05/30
do that particular optimization.
@waywardmonkeys --
-std=gnu89 -funsigned-char -fno-strict-aliasing -fwrapv
for the gcc and clang compiles in the makefile fragments we distribute. I don't think the standard distribution needs to have compiler warnings turned way up, that's only of interest to people poking around more deeply and they can edit the options if they want.the overflow test doesn't have to be exact -- as long as it is conservative. I've been working on the lisp-based tests in Medley/internal test -- one of the files wouldn't compile because (constant (itimes 3.14159e10 1)) wouldn't fasl dump. d\
@masinter If you try maiko from current master
, this should be working now. I want to leave this open for now until we do at least step 3 above and get started towards step 4.
works for me now. I did get a message in the startup window "Failed to find UNIXCOMM file handles; no processes" and IL:ShellCommand("ls -l") now errors but that is the other update
We've made the code support using the overflow builtins. I think we could remove -fwrapv
from the build at this point ...
result is a 64-bit number. So (CL:* 300000 300000) will succeed and N_ARITH_SWITCH will cons up a FIXNUM rather than calling the UFN or allocating a bignum. mppy32 is just inline SPARC assembly code.