cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.12k stars 605 forks source link

Broken memmove() with gcc 12 #1212

Closed wkozaczuk closed 1 year ago

wkozaczuk commented 1 year ago

As I have been investigating the failing java tests on Fedora 37, I have discovered weird failures in realpath() when called from the dynamic linker:

#0  __lxstat (ver=ver@entry=1, pathname=pathname@entry=0x500000d2c000 "/usr/lib/jvmjavaa", st=st@entry=0x2000000fd680) at fs/vfs/main.cc:1425
#1  0x000000004039f150 in lstat (pathname=pathname@entry=0x500000d2c000 "/usr/lib/jvmjavaa", st=st@entry=0x2000000fd680) at fs/vfs/main.cc:1440
#2  0x00000000403bb700 in realpath (path=<optimized out>, resolved=0x500000d2c000 "/usr/lib/jvmjavaa") at libc/misc/realpath.c:183
#3  0x00000000402bc883 in elf::canonicalize (p="/usr/bin/java") at core/elf.cc:1372
#4  0x00000000402c5908 in elf::program::load_object (this=0x600000099d30, name=..., extra_path=..., loaded_objects=std::vector of length 0, capacity 0) at core/elf.cc:1416
#5  0x00000000402c6b6a in elf::program::get_library (this=this@entry=0x600000099d30, name="/usr/bin/java", extra_path=std::vector of length 0, capacity 0, 
    delay_init=delay_init@entry=true) at core/elf.cc:1478

After more debugging I realized the call to memmove in our realpath implementation yields weird results and more specifically this test added to tst-memmove.cc fails on Fedora 37 with gcc 12:

diff --git a/tests/tst-memmove.cc b/tests/tst-memmove.cc
index 36b75a41..4520a781 100644
--- a/tests/tst-memmove.cc
+++ b/tests/tst-memmove.cc
@@ -202,6 +202,10 @@ int main()
         memmove_test(destOffset, srcOffset, n);
     }

+    char buf10[128] = "/usr/lib/jvm/java/bin/java";
+    memmove(buf10, buf10 + 1, 27);
+    pass_if(buf10, "usr/lib/jvm/java/bin/java", 25);
+
     std::cerr << "PASSED\n";
     return 0;
 }

./scripts/run.py -e '/tests/tst-memmove.so' -c 1
OSv v0.57.0-7-g07a8b81e
eth0: 192.168.122.15
Booted up in 172.32 ms
Cmdline: /tests/tst-memmove.so
ERROR: got usr/lib/jvmjavaa/bin/java but expected usr/lib/jvm/java/bin/java

The same test compiled and executed on Fedora 35 (gcc 11) works just fine.

My suspicion is that gcc 12 compiles relevant code in arch/x64/string.cc probably incorrectly or we have a bug that comes to light right now. More specifically this function called indirectly by memmove probably yields incorrect results:

105 template <size_t N>
106 __attribute__((optimize("omit-frame-pointer")))
107 __attribute__((optimize("unroll-loops")))
108 void* do_small_memcpy(void *dest, const void *src)
109 {
110     struct [[gnu::packed]] data {
111         char x[N];
112     };
113     *static_cast<data*>(dest) = *static_cast<const data*>(src);
114     return dest;
115 }
nyh commented 1 year ago

Wow, good catch. I can confirm that this test also fails on my Fedora 36, with gcc 12.2.1.

nyh commented 1 year ago

I confirmed that if memcpy_repmov_ssse3() is modified t not call small_memcpy(), the bug indeed doesn't happen - so that function is indeed to blame. I'll try to think how it may misscompile.

nyh commented 1 year ago

I was worried that maybe the fancy casting does something wrong (some "strict aliasing rule violation", as usual), but bizarrely, it seems to me now that the compiler actually recognizes it as a case of copying, calls __builtin_memcpy(), and it is that __builtin_memcpy() which is broken! The following trivial C++ code returns the wrong result:

#include <cstdio>
#include <cstring>
int main(){
    char bufa[128] = "0123456789abcdefghijklmnopqrstuvwxyz";
    char bufb[128] = "0123456789abcdefghijklmnopqrstuvwxyz";
    memcpy(bufa, bufa+1, 27);
    printf("          memcpy: %s\n", bufa);
    __builtin_memcpy(bufb, bufb+1, 27);
    printf("__builtin_memcpy: %s\n", bufb);
}

This should print the same value twice, but it's different in gcc 12.2.1!

I'll open a bug report for gcc, but I'll also try to think how we can avoid work around this bug, if we can at all. At worst case we can detect if the bug exists and if so have small_memcpy() just call sse_memcpy() (or have its callers don't call small_memcpy().

nyh commented 1 year ago

I opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108296

nyh commented 1 year ago

The resolution of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108296 as a duplicate led me to two very interesting insights:

  1. The miscompilation bug itself is not caused by the implementation of _builtin_memcpy but by the optimizer using it in the wrong place. _builtin_memcpy or even memcpy cannot be used for overlapping ranges as its documentation says.
  2. BUT, in our case our caller code called memcpy() with an overlapping range. This is a mistake - it should have used memmove(), not memcpy(). I'll try to find thisp lace and produce a patch.
nyh commented 1 year ago

Oops, I just realized - our caller DOES use memmove(). This memmove() just calls our memcpy() in cases when it thinks it is safe. So the mis-compilation is a problem :-(

nyh commented 1 year ago

The following hacky patch fixed the new memmove test and the Java test: (only the repmove_ssse3 version is actually necessary on a modern CPU)

diff --git a/arch/x64/string.cc b/arch/x64/string.cc
index b3782fe7..571049f9 100644
--- a/arch/x64/string.cc
+++ b/arch/x64/string.cc
@@ -220,9 +220,16 @@ extern "C"
 [[gnu::optimize("omit-frame-pointer")]]
 void *memcpy_repmov_old(void *__restrict dest, const void *__restrict src, size_t n)
 {
+// Gcc 12 miscompiles our small_memcpy to use __builtin_memcpy() instead
+// of memmove, so fails for overlapping regions that this function is supposed
+// to correctly support. See https://github.com/cloudius-systems/osv/issues/1212
+#if __GNUC__ == 12
+    if (n < 1024) {
+#else
     if (n < small_memcpy_lim) {
         return small_memcpy(dest, src, n);
     } else if (n < 1024) {
+#endif
         return sse_memcpy(dest, src, n);
     } else {
         auto ret = dest;
@ -240,9 +247,16 @@ extern "C"
 [[gnu::optimize("omit-frame-pointer")]]
 void *memcpy_repmov(void *__restrict dest, const void *__restrict src, size_t n)
 {
+// Gcc 12 miscompiles our small_memcpy to use __builtin_memcpy() instead
+// of memmove, so fails for overlapping regions that this function is supposed
+// to correctly support. See https://github.com/cloudius-systems/osv/issues/1212
+#if __GNUC__ == 12
+    if (n < 1024) {
+#else
     if (n < small_memcpy_lim) {
         return small_memcpy(dest, src, n);
     } else if (n < 1024) {
+#endif
         return sse_memcpy(dest, src, n);
     } else {
         auto ret = dest;
@@ -255,9 +269,16 @@ extern "C"
 [[gnu::optimize("omit-frame-pointer")]]
 void *memcpy_repmov_old_ssse3(void *__restrict dest, const void *__restrict src, size_t n)
 {
+// Gcc 12 miscompiles our small_memcpy to use __builtin_memcpy() instead
+// of memmove, so fails for overlapping regions that this function is supposed
+// to correctly support. See https://github.com/cloudius-systems/osv/issues/1212
+#if __GNUC__ == 12
+    if (n < 1024) {
+#else
     if (n < small_memcpy_lim) {
         return small_memcpy(dest, src, n);
     } else if (n < 1024) {
+#endif
         return sse_memcpy(dest, src, n);
     } else if (n < 65536 && !both_aligned(dest, src, 16)) {
         ssse3_unaligned_copy(dest, src, n);
@@ -278,9 +299,16 @@ extern "C"
 [[gnu::optimize("omit-frame-pointer")]]
 void *memcpy_repmov_ssse3(void *__restrict dest, const void *__restrict src, size_t n)
 {
+// Gcc 12 miscompiles our small_memcpy to use __builtin_memcpy() instead
+// of memmove, so fails for overlapping regions that this function is supposed
+// to correctly support. See https://github.com/cloudius-systems/osv/issues/1212
+#if __GNUC__ == 12
+    if (n < 1024) {
+#else
     if (n < small_memcpy_lim) {
         return small_memcpy(dest, src, n);
     } else if (n < 1024) {
+#endif
         return sse_memcpy(dest, src, n);
     } else if (n < 65536 && !both_aligned(dest, src, 16)) {
         ssse3_unaligned_copy(dest, src, n);
diff --git a/tests/tst-memmove.cc b/tests/tst-memmove.cc
index 36b75a41..4520a781 100644
--- a/tests/tst-memmove.cc
+++ b/tests/tst-memmove.cc
@@ -202,6 +202,10 @@ int main()
         memmove_test(destOffset, srcOffset, n);
     }

+    char buf10[128] = "/usr/lib/jvm/java/bin/java";
+    memmove(buf10, buf10 + 1, 27);
+    pass_if(buf10, "usr/lib/jvm/java/bin/java", 25);
+
     std::cerr << "PASSED\n";
     return 0;
 }

Unfortunately, scripts/run.py -e 'tests/tst-commands.so' -c 1 still fails, I need to check why.

wkozaczuk commented 1 year ago

One more test would fail in my case:

  TEST tst-feexcept.so                    OSv v0.57.0-7-g07a8b81e
eth0: 192.168.122.15
Booted up in 153.32 ms
Cmdline: /tests/tst-feexcept.so
0
FAIL: /git-repos/osv/tests/tst-feexcept.cc:82: For sig_check([] { printf("%d\n", 1 / zero_i()); }, SIGFPE), expected true, saw 0.
SUMMARY: 14 tests, 1 failures
Test tst-feexcept.so FAILED
Traceback (most recent call last):
  File "/git-repos/osv/./scripts/test.py", line 191, in <module>
    main()
  File "/git-repos/osv/./scripts/test.py", line 147, in main
    run_tests()
  File "/git-repos/osv/./scripts/test.py", line 136, in run_tests
    run(tests_to_run)
  File "/git-repos/osv/./scripts/test.py", line 109, in run
    run_test(test)
  File "/git-repos/osv/./scripts/test.py", line 84, in run_test
    test.run()
  File "/git-repos/osv/scripts/tests/testing.py", line 38, in run
    run_command_in_guest(self.command, hypervisor=self.hypervisor, run_py_args=self.run_py_args).join()
  File "/git-repos/osv/scripts/tests/testing.py", line 186, in join
    raise Exception('Guest failed')
Exception: Guest failed

But the other ones seem to pass:

./scripts/test.py -d tst-commands.so -d tst-feexcept.so
...
OK (140 tests run, 186.646 s)
wkozaczuk commented 1 year ago

The test_loader_parse_cmdline() is calling memset(). Wild guess: maybe we have a similar issue like with memmove()?