bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.76k stars 597 forks source link

Signal 11 AOT ARMv7 #2548

Open g0djan opened 11 months ago

g0djan commented 11 months ago
TERMINATING ON A CRASH
E  Signal 11
(os_mutex_unlock+8) 
(aot_call_function+336) 
(aot_instantiate+3748)
(wasm_runtime_instantiate+36)
(wasm_instance_new_with_args+960)

It happens on different tests for me, but this stacktace I got running this test https://github.com/bytecodealliance/libc-test/blob/master/src/functional/strtof.c and this is the most desciptive stacktrace for this problem I've got so far.

Also Sig11 happened on https://github.com/bytecodealliance/libc-test/blob/master/src/functional/clock_gettime.c but I didn't get any stacktrace for it.

Compiled the test to wasm as part of wasi-libc tests following manual and later to AOT by running

$WAMRC \
    --enable-multi-thread \
    --disable-simd \
    --target=armv7 \
    --target-abi=eabi \
    --cpu-features=-neon \
    --cpu=generic \
    --size-level=1 \
    -o test.aot test.wasm

This I got from running on Nvidia Shield TV P2897. Though it's hard to reproduce on Nvidia shield tv. I got similar SIG11 on a Roku 4200X device which reproduces very well but I didn't figure whether it's the same problem or not.

@wenyongh @yamt @abrown has anyone seen anything similar ever?

wenyongh commented 11 months ago

@g0djan I haven't seen the similar issue before, it seems like there is something wrong with the lock operations, maybe a mutex is unlocked without locking it before or without initializing it? Could you help dump the stacktrace? If it is not easy, maybe you can change the code to dump which file/line is calling os_mutex_unlock:

//platform_api_vmcore.h
int
os_mutex_unlock_internal(korp_mutex *mutex, char *file, int line);

#define os_mutex_unlock(mutex) os_mutex_unlock(mutex, __FILE__, __LINE__)

//c file, like posix_thread.c
//change os_mutex_unlock to
int
os_mutex_unlock_internal(korp_mutex *mutex, char *file, int line)
{
    int ret;

    os_printf("##%s, line: %d\n", file, line);

    assert(mutex);
    ret = pthread_mutex_unlock(mutex);

    return ret == 0 ? BHT_OK : BHT_ERROR;
}
g0djan commented 11 months ago

Hey, it's the best I got yet.

e.g. I managed to reproduce SIG11 on Friday, but all stacktrace I got is

com.android.runtime/lib/bionic/libc.so ((unknown)+0)

and this time it was wasi-libc/swprintf test compiled to aot.

I will try to run it infinitely today in debug and maybe I will be able to dump more

g0djan commented 11 months ago

I've managed to reproduce on Nvidia Shield TV again with lldb attached, but couldn't save a coredump that's the best what I managed to save. Backtraces for other threads are not wamr related. It was a different test this time though but also SIG11 https://github.com/bytecodealliance/wasm-micro-runtime/blame/main/core/iwasm/libraries/lib-wasi-threads/test/update_shared_data_and_alloc_heap.c#L61

Possibly could be from a call to one of the next functions(last log line tells it tried to initialise a wamr module):

* thread #53, name = 'Thread-21', stop reason = breakpoint 15.1
  * frame #0: 0x45617208 libart.so`art_sigsegv_fault
    frame #1: 0x45617552 libart.so`art::FaultManager::HandleFault(int, siginfo*, void*) + 218
    frame #2: 0x0e50feb6 app_process32`art::SignalChain::Handler(int, siginfo*, void*) + 378
    frame #3: 0x438b420c libc.so`__restore_rt
    frame #4: 0x60e6f060
    frame #5: 0x914eb584 <..>.so`aot_call_function(exec_env=<unavailable>, function=<unavailable>, argc=<unavailable>, argv=<unavailable>) at aot_runtime.c:1549:15
    frame #6: 0x914d25b4 <..>.so`wasm_runtime_call_wasm(exec_env=0xa9b01030, function=0xa21b6bb0, argc=0, argv=0x929f0834) at wasm_runtime_common.c:1977:15
    frame #7: 0x914cad28 <..>.so`wasm_func_call(func=0xa6cadb00, params=0x929f0ac0, results=0x929f0aa4) at wasm_c_api.c:3349:10

sig11_android3_bottom_variables 2 sig11_android3_top_variables 2

g0djan commented 10 months ago

@wenyongh I've seen android in WAMR CI. Does it run any tests on android? And especially on ARM? I was able to reproduce it on different platforms, but only with aot on arm so I guess the problem might be there

g0djan commented 10 months ago

Okay, seems it's x86_32 anyway https://github.com/bytecodealliance/wasm-micro-runtime/blob/613c7ca48f7dcc6e6ede6a3a0826d065036fa542/product-mini/platforms/android/CMakeLists.txt#L23

wenyongh commented 10 months ago

@g0djan Currently CI only builds iwasm in Android platform, and yes, WAMR_BUILD_TARGET is set to X86_32. Maybe we should fix the latter and enhance the CI.