andrenth / ocaml-stdint

Various signed and unsigned integers for OCaml
Other
84 stars 15 forks source link

Segfaults with 128-bit integers on Windows #45

Open tahina-pro opened 4 years ago

tahina-pro commented 4 years ago

With my colleague @msprotz, we observed segfaults related to 128-bit integers in some Windows OCaml programs using ocaml-stdint.

It turns out that even the test executable produced by make test/stdint_test segfaults, with no test running, as we show below.

We are working on Windows 10 version 1909 (10.0.18363.720) with Cygwin 3.1.4-1, mingw64-x86_64-gcc 9.2.0, and OCaml 4.09.0 installed via OCaml and OPaM for Windows. If you want to quickly reproduce our config, you can pull a complete Docker image pulling the latest ocaml-stdint master (0785788f425bd06635608fe111a851ec86e095bf) and building ocaml-stdint and its test suite. You can also rebuild the Docker image yourself using this Dockerfile.txt.

Then, below is what we are observing:

ContainerAdministrator@cff3016f7f7e /cygdrive/c/test/ocaml-stdint
$ gdb tests/stdint_test.exe
GNU gdb (GDB) (Cygwin 8.2.1-1) 8.2.1

(gdb) run
Starting program: /cygdrive/c/test/ocaml-stdint/tests/stdint_test.exe
[New Thread 5000.0x33c4]
[New Thread 5000.0x32b0]
[New Thread 5000.0x5110]
[New Thread 5000.0x5710]

Thread 1 received signal SIGSEGV, Segmentation fault.
copy_uint128 (i=<optimized out>) at uint128_stubs.c:188
188       Uint128_val(res) = i;
(gdb) backtrace
#0  copy_uint128 (i=<optimized out>) at uint128_stubs.c:188
#1  0x0000000000494e65 in uint128_of_int (v=<optimized out>) at uint128_conv.c:34
#2  0x000000000042f2e6 in camlStdint__entry ()
#3  0x0000000000000001 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

As a workaround, my colleague @msprotz proposed to recompile ocaml-stdint with gcc optimizations disabled, which we did in tahina-pro/ocaml-stdint@99ac1fa234db8d57c4a43837ae71fda51d8d1470, and there we no longer observe these segfaults.

rixed commented 4 years ago

Do you know if your version of mingw supports the __int128_t type? Windows docker image cannot run on Linux so I cannot repro your exact issue, but I also have a segfault when running the tests without HAVE_INT128, so maybe that's related?

tahina-pro commented 4 years ago

Yes, in a container running from the Docker image, I can successfully compile and run a __int128_t function that computes 2^(2^n):

docker run -i -t tahina/20200318ocamlstdint

ContainerAdministrator@52e7a61f5bdc /cygdrive/c/test/ocaml-stdint
$ cat <<EOF >a.c
#include <stdint.h>
#include <assert.h>
__int128_t two_pow_two_pow_minus_one(int count) {
  __int128_t res = 2;
  for (int i = 0; i < count; ++i) {
    res *= res;
  }
  assert (res > 0);
  return res - 1;
}
EOF

$ x86_64-w64-mingw32-gcc.exe -Wall -c a.c

$ cat <<EOF >test.c
#include <inttypes.h>
#include <stdio.h>
__int128_t two_pow_two_pow_minus_one(int count);
int main(void) {
  uint64_t res = (uint64_t) (two_pow_two_pow_minus_one(6));
  printf("%" PRIu64 "\n", res);
  return 0;
}
EOF

$ x86_64-w64-mingw32-gcc.exe -Wall -c test.c

$ x86_64-w64-mingw32-gcc.exe -Wall -o test.exe a.o test.o

$ ./test.exe
18446744073709551615

$ echo $?
0
hcarty commented 4 years ago

If it helps - using an older 4.08.0 installation, also using fdopen's opam distribution, I don't get a segfault. On 4.10.0 I do get it.

rseymour commented 4 years ago

Tried compiling the above test C progs with -O3 and they compiled/ran without error. My prior test w/ an ifdef check also worked.

msprotz commented 4 years ago

I should mention that the error started happening after we upgraded GCC (specifically: mingw) to version 9. We were previously on version 7 and there was no trouble.

rseymour commented 4 years ago
Thread 1 received signal SIGSEGV, Segmentation fault.
0x000000000044e87d in copy_int128 (i=<optimized out>) at int128_stubs.c:146
146       Int128_val(res) = i;

This gdb output seems to be saying that GCC optimized out the variable i in the copy_int128() function. I tried throwing a volatile next to it in the args to the function, but it didn't seem to stop it. Feels like gcc at -O2 shouldn't optimize out the right hand side of an assignment, but who knows.

rseymour commented 4 years ago

I found perl folks wrap code with a lot of guards to protect it from new GCC... https://github.com/perl11/cperl/commit/c0b59c82b053a3434cf3c20d36d1a3406421bc3d (this was a failed(?) commit to add even more -O0 style guards to the existing ones)

rgrinberg commented 3 years ago

@rseymour do you think we should such guards as well?

rseymour commented 3 years ago

We didn't end up going with them as the other fix worked, but I feel like it's the "right" way to do it until GCC fixes itself.

msprotz commented 3 years ago

Just coming back to this issue after a long while...

Are we positive that this is a GCC bug? I can't tell from the previous discussion what the "misbehavior" might be. The fact that we observe the issue at O2 and above doesn't necessarily reflect a GCC issue; it could be that the bindings code does something illegal per the C standard, and that GCC only becomes "smart" at O2.

Maybe the issue is this: when HAVE_INT128 is absent, some unspecified issue observable only at -O2 and above causes a segfault.

Then there are two fixes:

Is this an accurate summary?

Would you be open to the first fix?

FYI, we're still running our entire infrastructure with the workaround outlined in this bug: we compile the test program, see if it segfaults, and if it does, recompile ocaml-stdint with -O0 to prevent the error from happening.

Thanks!

Jonathan

dra27 commented 3 years ago

I just discovered I was being bitten by this too - making HAVE_INT128 is definitely not helping with gcc 9.2.0 on Cygwin 3.1.7 in a Docker container on Windows 10 1910 (10.0.18363.1139).

A brief glance suggests this should work (a very brief glance suggests mingw-w64 doesn't provide the macro at all):

diff --git a/lib/int128.h b/lib/int128.h
index 194e1ae..1db352d 100644
--- a/lib/int128.h
+++ b/lib/int128.h
@@ -1,7 +1,7 @@
 #ifndef OCAML_INT128_H
 #define OCAML_INT128_H

-#if defined(__SIZEOF_INT128__)
+#if defined(__SIZEOF_INT128__) || defined(__MINGW32__)
 #define HAVE_INT128
 typedef __int128_t int128;
 #else
diff --git a/lib/uint128.h b/lib/uint128.h
index 3f97dab..864ea40 100644
--- a/lib/uint128.h
+++ b/lib/uint128.h
@@ -1,7 +1,7 @@
 #ifndef OCAML_UINT128_H
 #define OCAML_UINT128_H

-#if defined(__SIZEOF_INT128__)
+#if defined(__SIZEOF_INT128__) || defined(__MINGW32__)
 #define HAVE_UINT128
 typedef __uint128_t uint128;
 #else
dra27 commented 3 years ago

This and forcing -O0 is working fine, for an unoptimized value of "fine"!

msprotz commented 3 years ago

Your solution seems fine to me @dra27 -- do I understand you correctly that we need both HAVE_INT128 and -O0? Definitely something fishy going on here but I agree that as long as the issue is mitigated it's "fine" for us too.

Thanks for chiming in! Delightful to see you jump in on a Windows-only obscure optimization-related bug ;-).

dra27 commented 3 years ago

I haven't tested if -O0 on its own is sufficient, no (it looks like it probably will be). I was going on the assumption that intrinsic support for __int128_t is better, so it seemed better to have that by default, but then I also discovered that it was still segfaulting...

A glorious coincidence to have jumped in within hours of you - I just happen to be hacking cap'n proto on mingw-w64 at the moment! 🙂

rixed commented 3 years ago

May I suggest adding a reference to this issue in a comment in the patch itself? I think actual comments are better than relying on the commit message in instances like these.

dra27 commented 3 years ago

I should have had a more in-depth glance, __SIZEOF_INT128__, is provided.

dra27 commented 3 years ago

OK, I've dug a bit further. The problem is alignment, and it's not Windows-specific, I expect it's just not been seen on Linux or elsewhere yet. Here's the relevant assembly snippet for copy_int128:

       leaq    int128_ops(%rip), %rcx
       movq    280(%rax), %rsi
       call    caml_alloc_custom
       movaps  %xmm6, 8(%rax)

however, this assumes that 8(%rax) is 16-byte aligned and there's absolutely no guarantee of that, as OCaml only guarantees 8-byte alignment on a 64-bit platform. If you "do something" with the 128 bit int, then gcc stops using xmm6 to hold it (I found that by coincidence, since a printf of the two 64-bit components of it "fixes" it!).

Unfortunately, this cast is invalid:

#define Int128_val(v) (*((int128 *)Data_custom_val(v)))

For whatever reason, in 4.07.1 it appears that the allocator much more readily allocated 16-byte aligned custom values (I think this is simply an image layout coincidence).