Closed xbcnn closed 1 week ago
When I try to run that program using either em++
or just clang++
it seem to crash/segfault:
$ clang++ test.cc
$ ./a.out
Segmentation fault
$ em++ test.cc
$ node ./a.out.js
/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:121
throw ex;
^
RuntimeError: memory access out of bounds
at wasm://wasm/000b2fde:wasm-function[30]:0x190a
at wasm://wasm/000b2fde:wasm-function[56]:0x23ee
at wasm://wasm/000b2fde:wasm-function[229]:0x55f6
at /usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:625:12
at callMain (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4331:15)
at doRun (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4381:23)
at run (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4396:5)
at runCaller (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:4316:19)
at removeRunDependency (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:561:7)
at receiveInstance (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:745:5)
Node.js v18.20.3
I think the problem is that your program doesn't check the value of pm
and just assumes its not null. You probably need some kind of explicit null check on the result of get_map
.
Yes, it's a real problem that no null check on pm
before accessing it. But that is not the point.
You need build it with -O2
or -O3
option to reproduce. Default compile option, which I suppose is -g
, can not reproduce it.
em++ -O3 test.cc
node a.out.js # hang
It seems not hang in the for-loop iteration on *pm
, but hang in get_map
when calling find(k)
.
What surprised me more is when add any line before accessing *pm
in test
func, such as std::cout << "hello" << std::endl
, the program doesn't hang any more.
int test(int k) {
auto* pm = get_map((uintptr_t)k);
std::cout << "hello" << std::endl; // this line stops hanging
for (auto& [i, s] : *pm) {
std::cout << i << " " << s << std::endl;
}
return 0;
}
$ em++ -O3 test.cc
$ node a.out.js # output "hello" and no more hang
hello
$
With emscripten you don't necessarily get an immediate crash on null pointer dereferencing like you might do on other platforms.
Some build settings such as -sSAFE_HEAP
will detect these, but in general the program could continue to run after the null pointer dereference (the program is just reading and writing from locations around zero, which is valid in WebAssembly), but that program will be in an undefined state after that point, since null pointer dereferencing is undefined behaviour.
Did you mean that this could be compiler optimization based on undefined behavior? Such as, I guess, a pointer dereferenced without null check, is 'telling' compiler that it won't be null and those code paths which explicitly lead to null can be optimized out.
If this is the way behind, how can I confirm the compiler does and when does such optimization? I am wondering whether there are some compiler verbose logs to see these, and if we can disable these specific optimizations.
I investigated a little the wasm codes difference between the 2 versions(with line std::cout << "hello" << std::endl;
or not, and I used emcc -O3 test.cc -o test.html
) .
The one which has the "hello" line will check value at address 15252, which is zero as i32
, and then break out there so won't hang inside. I think it is checking the bucket count of the unordered_map g_map
, since the map is actual empty, find(k) will directly return.
I don't get the opt rules applied here but they should be related to null pointer dereferencing UB. What do you think? Or somebody else can ask help explain this?
# without the "hello" line
...
block $label0 (result i32)
i32.const 15252
i32.load
local.tee $var0
local.get $var0
i32.const 1
i32.sub
local.tee $var1
i32.and
if
i32.const 15248
i32.load
local.get $var0
i32.const 1
i32.le_u
if (result i32)
i32.const 1
local.get $var0
i32.rem_u
else
i32.const 1
end
i32.const 2
i32.shl
i32.add
i32.load
br $label0
end
i32.const 15248
i32.load
local.get $var1
i32.const 1
i32.and
i32.const 2
i32.shl
i32.add
i32.load
end $label0
...
;; with the "hello" line
...
block $label0
i32.const 15252
i32.load
local.tee $var1
i32.eqz ;; $var1 == 0
br_if $label0 ;; Here break out
i32.const 15248
i32.load
block $label1 (result i32)
local.get $var1
i32.const 1
i32.sub
i32.const 1
i32.and
local.get $var1
i32.popcnt
local.tee $var3
i32.const 1
i32.le_u
br_if $label1
drop
i32.const 1
local.get $var1
i32.const 1
i32.gt_u
br_if $label1
drop
i32.const 1
local.get $var1
i32.rem_u
end $label1
local.tee $var4
i32.const 2
i32.shl
i32.add
i32.load
local.tee $var0
i32.eqz
br_if $label0
local.get $var0
i32.load
local.tee $var0
i32.eqz
br_if $label0
...
Did you mean that this could be compiler optimization based on undefined behavior?
No, I'm saying that reads and writes to address zero (or addresses near zero) do not cause a wasm program to crash as it might on other platforms.
In order to get the program to crash on such reads/writes it would need to be augmented in some way such as by the -sSAFE_HEAP
setting that emscripten has.
I got what you mean.
I tried clang++ with -g
and run it under a debugger, then I saw it stops and reports EXC_BAD_ACCESS
in __hash_table::find()
.
$ clang++ test.cc -O3 -g
$ lldb a.out
Process 88224 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x0000000100003e34 a.out`main at __hash_table:2316:31 [opt]
2313 if (__bc != 0)
2314 {
2315 size_t __chash = std::__constrain_hash(__hash, __bc);
-> 2316 __next_pointer __nd = __bucket_list_[__chash];
2317 if (__nd != nullptr)
Seems the if (__bc != 0)
condition check doesn't work. Or just been optimized out, but why?
__bc
is bucket count, which is definitely zero since the g_map
is empty and no elements ever inserted. The program should not go into the if block.
I imagine you are looking the a native apple build? So the source code in question is probably from apple's build of libc++ which is likely not quite the same as emscripten's version. But the same principles still apply.
Here your get_map
function returns a null pointer and then you try to iterate through that null pointer as an unordered_map
. Clearly that is undefined behaviour and the attempt to iterate will likely cause some kind of trap or crash. I can't tell you exactly why the trap doesn't occur earlier when __bc
is initialized by calling the bucket_count
method on a null pointer, but I agree that optimizations could occur which re-order things here.
In any case, calling the __emplace_unique_key_args
method on the null hash table (or indeed any method) is undefined behaviour, and your are not guaranteed to get a well defined crash when doing that.
I imagine you are looking the a native apple build? So the source code in question is probably from apple's build of libc++ which is likely not quite the same as emscripten's version.
Yes, I used clang_16++ on macOs for the native builds. And I think the source is same Emscripten's, at least in the __hash_table::find()
part. Following is the complete source of __hash_table::find()
.
2306 template <class _Tp, class _Hash, class _Equal, class _Alloc>
2307 template <class _Key>
2308 typename __hash_table<_Tp, _Hash, _Equal, _Alloc>::iterator
2309 __hash_table<_Tp, _Hash, _Equal, _Alloc>::find(const _Key& __k)
2310 {
2311 size_t __hash = hash_function()(__k);
2312 size_type __bc = bucket_count();
2313 if (__bc != 0)
2314 {
(lldb) l
2315 size_t __chash = std::__constrain_hash(__hash, __bc);
2316 __next_pointer __nd = __bucket_list_[__chash];
2317 if (__nd != nullptr)
2318 {
2319 for (__nd = __nd->__next_; __nd != nullptr &&
2320 (__nd->__hash() == __hash
2321 || std::__constrain_hash(__nd->__hash(), __bc) == __chash);
2322 __nd = __nd->__next_)
2323 {
2324 if ((__nd->__hash() == __hash)
(lldb)
2325 && key_eq()(__nd->__upcast()->__value_, __k))
2326 return iterator(__nd, this);
2327 }
2328 }
2329 }
2330 return end();
2331 }
Here your
get_map
function returns a null pointer and then you try to iterate through that null pointer as anunordered_map
. Clearly that is undefined behaviour and the attempt to iterate will likely cause some kind of trap or crash.
Definitely using *pm
is undefined behaviour when pm
is null pointer. It crashes on native build. But the crash happens in get_map
function which is inlined into the main
function on opt build(-O3
).
If you inspect carefully on the generated codes for the inlined get_map
, both on native build or Emscripten build, there are no codes matching the if (__bc != 0)
check( line#2313), and even more, same to line#2317 if (__nd != nullptr)
. I'd like to believe this is the optimization based on undefined behaviour. I just don't know the exact rules behind. I believe someone who experts in code generation and optimization in LLVM will explain this reasonably.
To prove that it's not hanging in for..loop iteration on *pm
, I changed it to print out its size and 'inlined' function get_map
. Still reproduce.
int test(int k) {
std::unordered_map<int, std::string>* pm = nullptr;
auto it = g_map.find((uintptr_t)k);
if (it != g_map.end()) {
pm = &it->second;
}
// std::cout << "hello" << std::endl;
printf("size: %zu\n", pm->size());
return 0;
}
So now the undefined behaviour is in the pm->size()
call I suppose? The way undefined behaviour works is that the compiler can assume it never happens.
In this case, for example, it is reasonable to assume that since you are dereferencig pm
then the pm = &it->second;
assignment above must have occurred and it can remove the check for .end()
the compiler can also assume that g_map not empty, othersize find
could never succeed.
I suppose the assumption that g_map
is not ever empty means that the compiler can also assume at least one hash bucket and then eliminte the check of the bucket count against zero.
Great! Sounds reasonable if
The way undefined behaviour works is that the compiler can assume it never happens.
But another question:
Why the rule stop working if I open the line std::cout << "hello" << std::endl;
?
Compile and run the program prints out "hello", which means it doesn't hang or crash in the find
function.
int test(int k) {
std::unordered_map<int, std::string>* pm = nullptr;
auto it = g_map.find((uintptr_t)k);
if (it != g_map.end()) {
pm = &it->second;
}
std::cout << "hello" << std::endl;
printf("size: %zu\n", pm->size());
return 0;
}
% clang++ test.cc -O3 -g
% ./a.out
hello
zsh: segmentation fault ./a.out
% emcc -o out/test.html test.cc -O3
% node out/test.js
hello
size: 0
Undefined behaviour optimizations can be fickle. Sometime adding or removing something from you source file can inhibit/enable optimizations in a ways that are not obvious.
It's interesting and surprising.
I guess we can close this issue now?
Is there LLVM/Clang sources related to investigate such kind optimization behaviors?
Undefined behaviour optimizations a spread all over llvm. Its pretty complex subject I think.
I guess so. I'd love to know more about this but seems llvm has a high bar. Maybe search some keywords like "undefined" in llvm sources is good start point. :) Btw, I see a lot of your contributions to wasm-ld. That's a great job.
Google has lots of good starting points for "undefined behaviour + clang" see https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html for example
Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.61 (67fa4c16496b157a7fc3377afd69ee0445e8a6e3) Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt) This is free and open source software under the MIT license. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
There is a problem that it doesn't check the |pm| is null or not, and use it.
node.js
will loop endless and never return.emcc -O1 a.cpp -o a.html
, and run it again, it returns immediately.test
, which prints "hello", compile still with-O2
, then runnode a.js
again, it returns immediately and prints out "hello".Why builds with
-O2
and above work abnormally? And, Why adds a line whichstd::cout
something there, stops the 'optimization' and makes it work normally?I am not sure if there are some issues in optimizer or it is a reasonable/explicable behavior here. @sbc100 Any thoughts about this?