Endless loop when compile with -O2 and above?

xbcnn commented 2 weeks ago

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.61 (67fa4c16496b157a7fc3377afd69ee0445e8a6e3) Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt) This is free and open source software under the MIT license. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#include <iostream>
#include <string>
#include <unordered_map>

std::unordered_map<uintptr_t, std::unordered_map<int, std::string>> g_map;
std::unordered_map<int, std::string>* get_map(uintptr_t k) {
    auto it = g_map.find(k);
    if (it != g_map.end()) {
        return &it->second;
    }
    return nullptr;
}

int test(int k) {
    auto* pm = get_map((uintptr_t)k);
    // std::cout << "hello" << std::endl;
    for (auto& [i, s] : *pm) {
        std::cout << i << " " << s << std::endl;
    }
    return 0;
}

int main() {
    return test(1);
}

emcc -O2 a.cpp -o a.html

node a.js

There is a problem that it doesn't check the |pm| is null or not, and use it.

Run node.js will loop endless and never return.
If I compile it with lower optimization level, like emcc -O1 a.cpp -o a.html, and run it again, it returns immediately.
If I uncomment the line in function test, which prints "hello", compile still with -O2, then run node a.js again, it returns immediately and prints out "hello".

Why builds with -O2 and above work abnormally? And, Why adds a line which std::cout something there, stops the 'optimization' and makes it work normally?

I am not sure if there are some issues in optimizer or it is a reasonable/explicable behavior here. @sbc100 Any thoughts about this?

sbc100 commented 2 weeks ago

When I try to run that program using either em++ or just clang++ it seem to crash/segfault:

$ clang++ test.cc
$ ./a.out
Segmentation fault
$ em++ test.cc
$ node ./a.out.js 
/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:121
      throw ex;
      ^

RuntimeError: memory access out of bounds
    at wasm://wasm/000b2fde:wasm-function[30]:0x190a
    at wasm://wasm/000b2fde:wasm-function[56]:0x23ee
    at wasm://wasm/000b2fde:wasm-function[229]:0x55f6
    at /usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:625:12
    at callMain (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4331:15)
    at doRun (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4381:23)
    at run (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4396:5)
    at runCaller (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4316:19)
    at removeRunDependency (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:561:7)
    at receiveInstance (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:745:5)

Node.js v18.20.3

sbc100 commented 2 weeks ago

I think the problem is that your program doesn't check the value of pm and just assumes its not null. You probably need some kind of explicit null check on the result of get_map.

xbcnn commented 2 weeks ago

Yes, it's a real problem that no null check on pm before accessing it. But that is not the point.

You need build it with -O2 or -O3 option to reproduce. Default compile option, which I suppose is -g, can not reproduce it.

em++ -O3 test.cc
node a.out.js  # hang

It seems not hang in the for-loop iteration on *pm, but hang in get_map when calling find(k).

What surprised me more is when add any line before accessing *pm in test func, such as std::cout << "hello" << std::endl, the program doesn't hang any more.

int test(int k) {
    auto* pm = get_map((uintptr_t)k);
    std::cout << "hello" << std::endl;    // this line stops hanging
    for (auto& [i, s] : *pm) {
        std::cout << i << " " << s << std::endl;
    }
    return 0;
}

$ em++ -O3 test.cc
$ node a.out.js      # output "hello" and no more hang
hello
$

sbc100 commented 2 weeks ago

With emscripten you don't necessarily get an immediate crash on null pointer dereferencing like you might do on other platforms.

Some build settings such as -sSAFE_HEAP will detect these, but in general the program could continue to run after the null pointer dereference (the program is just reading and writing from locations around zero, which is valid in WebAssembly), but that program will be in an undefined state after that point, since null pointer dereferencing is undefined behaviour.

xbcnn commented 1 week ago

Did you mean that this could be compiler optimization based on undefined behavior? Such as, I guess, a pointer dereferenced without null check, is 'telling' compiler that it won't be null and those code paths which explicitly lead to null can be optimized out.

If this is the way behind, how can I confirm the compiler does and when does such optimization? I am wondering whether there are some compiler verbose logs to see these, and if we can disable these specific optimizations.

xbcnn commented 1 week ago

I investigated a little the wasm codes difference between the 2 versions(with line std::cout << "hello" << std::endl; or not, and I used emcc -O3 test.cc -o test.html) .

The one which has the "hello" line will check value at address 15252, which is zero as i32 , and then break out there so won't hang inside. I think it is checking the bucket count of the unordered_map g_map, since the map is actual empty, find(k) will directly return.

I don't get the opt rules applied here but they should be related to null pointer dereferencing UB. What do you think? Or somebody else can ask help explain this?

#  without the "hello" line
...
    block $label0 (result i32)
      i32.const 15252
      i32.load
      local.tee $var0
      local.get $var0
      i32.const 1
      i32.sub
      local.tee $var1
      i32.and
      if
        i32.const 15248
        i32.load
        local.get $var0
        i32.const 1
        i32.le_u
        if (result i32)
          i32.const 1
          local.get $var0
          i32.rem_u
        else
          i32.const 1
        end
        i32.const 2
        i32.shl
        i32.add
        i32.load
        br $label0
      end
      i32.const 15248
      i32.load
      local.get $var1
      i32.const 1
      i32.and
      i32.const 2
      i32.shl
      i32.add
      i32.load
    end $label0
...

;; with the "hello" line
...
    block $label0
      i32.const 15252
      i32.load
      local.tee $var1
      i32.eqz                ;;  $var1 == 0
      br_if $label0          ;;  Here break out
      i32.const 15248
      i32.load
      block $label1 (result i32)
        local.get $var1
        i32.const 1
        i32.sub
        i32.const 1
        i32.and
        local.get $var1
        i32.popcnt
        local.tee $var3
        i32.const 1
        i32.le_u
        br_if $label1
        drop
        i32.const 1
        local.get $var1
        i32.const 1
        i32.gt_u
        br_if $label1
        drop
        i32.const 1
        local.get $var1
        i32.rem_u
      end $label1
      local.tee $var4
      i32.const 2
      i32.shl
      i32.add
      i32.load
      local.tee $var0
      i32.eqz
      br_if $label0
      local.get $var0
      i32.load
      local.tee $var0
      i32.eqz
      br_if $label0
...

sbc100 commented 1 week ago

Did you mean that this could be compiler optimization based on undefined behavior?

No, I'm saying that reads and writes to address zero (or addresses near zero) do not cause a wasm program to crash as it might on other platforms.

In order to get the program to crash on such reads/writes it would need to be augmented in some way such as by the -sSAFE_HEAP setting that emscripten has.

xbcnn commented 1 week ago

I got what you mean.

xbcnn commented 1 week ago

I tried clang++ with -g and run it under a debugger, then I saw it stops and reports EXC_BAD_ACCESS in __hash_table::find().

$ clang++ test.cc -O3 -g
$ lldb a.out

Process 88224 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x0000000100003e34 a.out`main at __hash_table:2316:31 [opt]
   2313     if (__bc != 0)
   2314     {
   2315         size_t __chash = std::__constrain_hash(__hash, __bc);
-> 2316         __next_pointer __nd = __bucket_list_[__chash];
   2317         if (__nd != nullptr)

xbcnn commented 1 week ago

Seems the if (__bc != 0) condition check doesn't work. Or just been optimized out, but why? __bc is bucket count, which is definitely zero since the g_map is empty and no elements ever inserted. The program should not go into the if block.

sbc100 commented 1 week ago

I imagine you are looking the a native apple build? So the source code in question is probably from apple's build of libc++ which is likely not quite the same as emscripten's version. But the same principles still apply.

Here your get_map function returns a null pointer and then you try to iterate through that null pointer as an unordered_map. Clearly that is undefined behaviour and the attempt to iterate will likely cause some kind of trap or crash. I can't tell you exactly why the trap doesn't occur earlier when __bc is initialized by calling the bucket_count method on a null pointer, but I agree that optimizations could occur which re-order things here.

In any case, calling the __emplace_unique_key_args method on the null hash table (or indeed any method) is undefined behaviour, and your are not guaranteed to get a well defined crash when doing that.

xbcnn commented 1 week ago

I imagine you are looking the a native apple build? So the source code in question is probably from apple's build of libc++ which is likely not quite the same as emscripten's version.

Yes, I used clang_16++ on macOs for the native builds. And I think the source is same Emscripten's, at least in the __hash_table::find() part. Following is the complete source of __hash_table::find().

   2306 template <class _Tp, class _Hash, class _Equal, class _Alloc>
   2307 template <class _Key>
   2308 typename __hash_table<_Tp, _Hash, _Equal, _Alloc>::iterator
   2309 __hash_table<_Tp, _Hash, _Equal, _Alloc>::find(const _Key& __k)
   2310 {
   2311     size_t __hash = hash_function()(__k);
   2312     size_type __bc = bucket_count();
   2313     if (__bc != 0)
   2314     {
(lldb) l
   2315         size_t __chash = std::__constrain_hash(__hash, __bc);
   2316         __next_pointer __nd = __bucket_list_[__chash];
   2317         if (__nd != nullptr)
   2318         {
   2319             for (__nd = __nd->__next_; __nd != nullptr &&
   2320                 (__nd->__hash() == __hash
   2321                   || std::__constrain_hash(__nd->__hash(), __bc) == __chash);
   2322                                                            __nd = __nd->__next_)
   2323             {
   2324                 if ((__nd->__hash() == __hash)
(lldb)
   2325                     && key_eq()(__nd->__upcast()->__value_, __k))
   2326                     return iterator(__nd, this);
   2327             }
   2328         }
   2329     }
   2330     return end();
   2331 }

Here your get_map function returns a null pointer and then you try to iterate through that null pointer as an unordered_map. Clearly that is undefined behaviour and the attempt to iterate will likely cause some kind of trap or crash.

Definitely using *pm is undefined behaviour when pm is null pointer. It crashes on native build. But the crash happens in get_map function which is inlined into the main function on opt build(-O3).

If you inspect carefully on the generated codes for the inlined get_map, both on native build or Emscripten build, there are no codes matching the if (__bc != 0) check( line#2313), and even more, same to line#2317 if (__nd != nullptr). I'd like to believe this is the optimization based on undefined behaviour. I just don't know the exact rules behind. I believe someone who experts in code generation and optimization in LLVM will explain this reasonably.

To prove that it's not hanging in for..loop iteration on *pm, I changed it to print out its size and 'inlined' function get_map. Still reproduce.

int test(int k) {
    std::unordered_map<int, std::string>* pm = nullptr;
    auto it = g_map.find((uintptr_t)k);
    if (it != g_map.end()) {
        pm = &it->second;
    }

    // std::cout << "hello" << std::endl;
    printf("size: %zu\n", pm->size());
    return 0;
}

sbc100 commented 1 week ago

So now the undefined behaviour is in the pm->size() call I suppose? The way undefined behaviour works is that the compiler can assume it never happens.

In this case, for example, it is reasonable to assume that since you are dereferencig pm then the pm = &it->second; assignment above must have occurred and it can remove the check for .end() the compiler can also assume that g_map not empty, othersize find could never succeed.

I suppose the assumption that g_map is not ever empty means that the compiler can also assume at least one hash bucket and then eliminte the check of the bucket count against zero.

xbcnn commented 1 week ago

Great! Sounds reasonable if

The way undefined behaviour works is that the compiler can assume it never happens.

xbcnn commented 1 week ago

But another question:
Why the rule stop working if I open the line std::cout << "hello" << std::endl; ? Compile and run the program prints out "hello", which means it doesn't hang or crash in the find function.

int test(int k) {
    std::unordered_map<int, std::string>* pm = nullptr;
    auto it = g_map.find((uintptr_t)k);
    if (it != g_map.end()) {
        pm = &it->second;
    }

    std::cout << "hello" << std::endl;
    printf("size: %zu\n", pm->size());
    return 0;
}

% clang++ test.cc -O3 -g
% ./a.out
hello
zsh: segmentation fault  ./a.out
% emcc -o out/test.html test.cc -O3
% node out/test.js
hello
size: 0

sbc100 commented 1 week ago

Undefined behaviour optimizations can be fickle. Sometime adding or removing something from you source file can inhibit/enable optimizations in a ways that are not obvious.

xbcnn commented 1 week ago

It's interesting and surprising.

sbc100 commented 1 week ago

I guess we can close this issue now?

xbcnn commented 1 week ago

Is there LLVM/Clang sources related to investigate such kind optimization behaviors?

sbc100 commented 1 week ago

Undefined behaviour optimizations a spread all over llvm. Its pretty complex subject I think.

xbcnn commented 1 week ago

I guess so. I'd love to know more about this but seems llvm has a high bar. Maybe search some keywords like "undefined" in llvm sources is good start point. :) Btw, I see a lot of your contributions to wasm-ld. That's a great job.

sbc100 commented 1 week ago

Google has lots of good starting points for "undefined behaviour + clang" see https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html for example

emscripten-core / emscripten

Endless loop when compile with -O2 and above? #22193