Open tahina-pro opened 4 years ago
Do you know if your version of mingw supports the __int128_t
type?
Windows docker image cannot run on Linux so I cannot repro your exact issue, but I also have a segfault when running the tests without HAVE_INT128, so maybe that's related?
Yes, in a container running from the Docker image, I can successfully compile and run a __int128_t
function that computes 2^(2^n):
docker run -i -t tahina/20200318ocamlstdint
ContainerAdministrator@52e7a61f5bdc /cygdrive/c/test/ocaml-stdint
$ cat <<EOF >a.c
#include <stdint.h>
#include <assert.h>
__int128_t two_pow_two_pow_minus_one(int count) {
__int128_t res = 2;
for (int i = 0; i < count; ++i) {
res *= res;
}
assert (res > 0);
return res - 1;
}
EOF
$ x86_64-w64-mingw32-gcc.exe -Wall -c a.c
$ cat <<EOF >test.c
#include <inttypes.h>
#include <stdio.h>
__int128_t two_pow_two_pow_minus_one(int count);
int main(void) {
uint64_t res = (uint64_t) (two_pow_two_pow_minus_one(6));
printf("%" PRIu64 "\n", res);
return 0;
}
EOF
$ x86_64-w64-mingw32-gcc.exe -Wall -c test.c
$ x86_64-w64-mingw32-gcc.exe -Wall -o test.exe a.o test.o
$ ./test.exe
18446744073709551615
$ echo $?
0
If it helps - using an older 4.08.0 installation, also using fdopen's opam distribution, I don't get a segfault. On 4.10.0 I do get it.
Tried compiling the above test C progs with -O3 and they compiled/ran without error. My prior test w/ an ifdef check also worked.
I should mention that the error started happening after we upgraded GCC (specifically: mingw) to version 9. We were previously on version 7 and there was no trouble.
Thread 1 received signal SIGSEGV, Segmentation fault.
0x000000000044e87d in copy_int128 (i=<optimized out>) at int128_stubs.c:146
146 Int128_val(res) = i;
This gdb output seems to be saying that GCC optimized out the variable i
in the copy_int128()
function. I tried throwing a volatile
next to it in the args to the function, but it didn't seem to stop it. Feels like gcc at -O2 shouldn't optimize out the right hand side of an assignment, but who knows.
I found perl folks wrap code with a lot of guards to protect it from new GCC... https://github.com/perl11/cperl/commit/c0b59c82b053a3434cf3c20d36d1a3406421bc3d (this was a failed(?) commit to add even more -O0
style guards to the existing ones)
@rseymour do you think we should such guards as well?
We didn't end up going with them as the other fix worked, but I feel like it's the "right" way to do it until GCC fixes itself.
Just coming back to this issue after a long while...
Are we positive that this is a GCC bug? I can't tell from the previous discussion what the "misbehavior" might be. The fact that we observe the issue at O2 and above doesn't necessarily reflect a GCC issue; it could be that the bindings code does something illegal per the C standard, and that GCC only becomes "smart" at O2.
Maybe the issue is this: when HAVE_INT128 is absent, some unspecified issue observable only at -O2 and above causes a segfault.
Then there are two fixes:
Is this an accurate summary?
Would you be open to the first fix?
FYI, we're still running our entire infrastructure with the workaround outlined in this bug: we compile the test program, see if it segfaults, and if it does, recompile ocaml-stdint with -O0 to prevent the error from happening.
Thanks!
Jonathan
I just discovered I was being bitten by this too - making HAVE_INT128
is definitely not helping with gcc 9.2.0 on Cygwin 3.1.7 in a Docker container on Windows 10 1910 (10.0.18363.1139).
A brief glance suggests this should work (a very brief glance suggests mingw-w64 doesn't provide the macro at all):
diff --git a/lib/int128.h b/lib/int128.h
index 194e1ae..1db352d 100644
--- a/lib/int128.h
+++ b/lib/int128.h
@@ -1,7 +1,7 @@
#ifndef OCAML_INT128_H
#define OCAML_INT128_H
-#if defined(__SIZEOF_INT128__)
+#if defined(__SIZEOF_INT128__) || defined(__MINGW32__)
#define HAVE_INT128
typedef __int128_t int128;
#else
diff --git a/lib/uint128.h b/lib/uint128.h
index 3f97dab..864ea40 100644
--- a/lib/uint128.h
+++ b/lib/uint128.h
@@ -1,7 +1,7 @@
#ifndef OCAML_UINT128_H
#define OCAML_UINT128_H
-#if defined(__SIZEOF_INT128__)
+#if defined(__SIZEOF_INT128__) || defined(__MINGW32__)
#define HAVE_UINT128
typedef __uint128_t uint128;
#else
This and forcing -O0
is working fine, for an unoptimized value of "fine"!
Your solution seems fine to me @dra27 -- do I understand you correctly that we need both HAVE_INT128
and -O0
? Definitely something fishy going on here but I agree that as long as the issue is mitigated it's "fine" for us too.
Thanks for chiming in! Delightful to see you jump in on a Windows-only obscure optimization-related bug ;-).
I haven't tested if -O0
on its own is sufficient, no (it looks like it probably will be). I was going on the assumption that intrinsic support for __int128_t
is better, so it seemed better to have that by default, but then I also discovered that it was still segfaulting...
A glorious coincidence to have jumped in within hours of you - I just happen to be hacking cap'n proto on mingw-w64 at the moment! 🙂
May I suggest adding a reference to this issue in a comment in the patch itself? I think actual comments are better than relying on the commit message in instances like these.
I should have had a more in-depth glance, __SIZEOF_INT128__
, is provided.
OK, I've dug a bit further. The problem is alignment, and it's not Windows-specific, I expect it's just not been seen on Linux or elsewhere yet. Here's the relevant assembly snippet for copy_int128
:
leaq int128_ops(%rip), %rcx
movq 280(%rax), %rsi
call caml_alloc_custom
movaps %xmm6, 8(%rax)
however, this assumes that 8(%rax)
is 16-byte aligned and there's absolutely no guarantee of that, as OCaml only guarantees 8-byte alignment on a 64-bit platform. If you "do something" with the 128 bit int, then gcc stops using xmm6
to hold it (I found that by coincidence, since a printf
of the two 64-bit components of it "fixes" it!).
Unfortunately, this cast is invalid:
#define Int128_val(v) (*((int128 *)Data_custom_val(v)))
For whatever reason, in 4.07.1 it appears that the allocator much more readily allocated 16-byte aligned custom values (I think this is simply an image layout coincidence).
With my colleague @msprotz, we observed segfaults related to 128-bit integers in some Windows OCaml programs using ocaml-stdint.
It turns out that even the test executable produced by
make test/stdint_test
segfaults, with no test running, as we show below.We are working on Windows 10 version 1909 (10.0.18363.720) with Cygwin 3.1.4-1, mingw64-x86_64-gcc 9.2.0, and OCaml 4.09.0 installed via OCaml and OPaM for Windows. If you want to quickly reproduce our config, you can pull a complete Docker image pulling the latest ocaml-stdint
master
(0785788f425bd06635608fe111a851ec86e095bf) and building ocaml-stdint and its test suite. You can also rebuild the Docker image yourself using this Dockerfile.txt.Then, below is what we are observing:
As a workaround, my colleague @msprotz proposed to recompile ocaml-stdint with gcc optimizations disabled, which we did in tahina-pro/ocaml-stdint@99ac1fa234db8d57c4a43837ae71fda51d8d1470, and there we no longer observe these segfaults.