Closed yzhang71 closed 3 months ago
Let's talk tomorrow and see what we can figure out about this
I think the first step is to see if our libc is calling some sort of tls_init function at all on startup. It would be super helpful to be able to see what functions glibc calls before main(). If there is some sort of function being called that probably would be the best place to try to integrate what wasi-libc is doing.
I'm not exactly sure how to profile the functions that are called though. Though something like wasm-nm may be useful to see what functions are even included in our libc executable.
Nick and Professor Cappos raised a great point that init_static_tls
should be called before the main
function, possibly in the crt1.o
file.
In WASI-libc, crt1.o
will call __libc_start_main
, which then calls __init_libc
, and subsequently calls __init_tls(aux)
.
Here is the current crt1.c we are using, but encountered some errors:
#include <stdlib.h>
// Declaration of the external main function
extern int main(int argc, char **argv, char **envp);
// Declaration of the __libc_start_main function
extern int __libc_start_main(int (*main) (int, char **, char **),
int argc,
char **ubp_av,
void (*init) (void),
void (*fini) (void),
void (*rtld_fini) (void),
void (*stack_end));
// Weak definitions for init, fini, and rtld_fini
void __attribute__((weak)) _init(void) {}
void __attribute__((weak)) _fini(void) {}
void __attribute__((weak)) _rtld_fini(void) {}
void _start() {
int argc;
char **argv;
char **envp;
__libc_start_main(main, argc, argv, _init, _fini, _rtld_fini, 0);
// main();
}
void __wasm_call_dtors() {
}
void __wasi_proc_exit(unsigned int exit_code) {
}
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_end
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_end
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_end
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __preinit_array_end
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __preinit_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __preinit_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __preinit_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __init_array_end
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __init_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __init_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __init_array_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(dl-support.o): undefined symbol: _dl_sysinfo_int80
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(dl-support.o): undefined symbol: __ehdr_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(dl-support.o): undefined symbol: __ehdr_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(dl-support.o): undefined symbol: __ehdr_start
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_end
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: __fini_array_start
Learning from wasi-libc, above errors can be solved by initializing both extern void (*__fini_array_start []) (void) 0;
and
extern void (*__fini_array_end []) (void) 0;
into 0 in /glibc/csu/libc-start.c::160
This sounds like a good strategy overall. Let's look at both wasi-libc and glibc and really understand what they are doing and why. Then we can model our design on whatever is most appropriate.
On Thu, Jul 11, 2024 at 5:48 PM Yuchen Zhang @.***> wrote:
wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: fini_array_end wasm-ld: error: ../../glibc/sysroot/lib/wasm32-wasi/libc.a(libc-start.o): undefined symbol: fini_array_start
Learning from wasi-libc, above errors can be solved by initializing both extern void (__fini_array_start []) (void) 0; and extern void (__fini_array_end []) (void) 0; into 0 in /glibc/csu/libc-start.c::160
— Reply to this email directly, view it on GitHub https://github.com/Lind-Project/lind-wasm/issues/2#issuecomment-2223999504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGROD5Z6QFMVKJQZRK5CQ3ZL34UPAVCNFSM6AAAAABKT6LUBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHE4TSNJQGQ . You are receiving this because you were assigned.Message ID: @.***>
Previously, we encountered the error of not initializing dl_tls_static_size
and dl_tls_static_align
. After discussing with Professor Cappos and Nick, the solution was to call TLS initialization before the main function (in crt1.c
). After checking the source code and testing, I now know that we should call __libc_setup_tls
, which will then call init_static_tls
. Since __libc_setup_tls
is a void function, we don't need to provide any arguments.
Using GDB to follow the execution path, I confirmed that the previous error has been resolved:
int a = GLRO(dl_tls_static_size);
int b = GLRO(dl_tls_static_align);
(gdb) p a
$1 = 2048
(gdb) p b
$2 = 64
Now, I will to move forward to address the error:
Caused by:
0: failed to invoke command default
1: error while executing at wasm backtrace:
0: 0x26e04 - <unknown>!__libc_message_impl
1: 0xd3af - <unknown>!__libc_assert_fail
2: 0x34b9d - <unknown>!allocate_stack
3: 0x33cff - <unknown>!__pthread_create_2_1
4: 0x599 - <unknown>!__original_main
5: 0x4ea - <unknown>!_start
6: 0x41d6f - <unknown>!_start.command_export
note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information
2: memory fault at wasm address 0x8dfb4000 in linear memory of size 0x30000
3: wasm trap: out of bounds memory access
Now, the error is due to an assertion failure:
/* Adjust the stack size for alignment. */
size &= ~tls_static_align_m1;
assert (size != 0);
in /glibc/nptl/allocatestack.c
After discussing with Professor Cappos and Nick, we decided to adopt WASI-libc's threading part into glibc. The reason is that simply calling __libc_start_main
is not working because it also initializes the .fini_array
, which is a section in ELF binaries, and WebAssembly doesn't support that.
I have recompiled WASI-libc with debug info and am now able to use GDB to trace the execution path:
_start(void)
in /wasi-libc/libc-bottom-half/crt/crt1-command.c
__wasi_init_tp()
in /wasi-libc/libc-bottom-half/crt/crt1-command.c
__init_tp((void *)__get_tp());
in /wasi-libc/libc-top-half/musl/src/env/__init_tls.c
__init_tp(void *p)
in /wasi-libc/libc-top-half/musl/src/env/__init_tls.c
I am now integrating the WASI-libc threading implementation into glibc and have migrated __init_tls.c
into the csu
directory and updated the Makefile accordingly. After fixing many define and initialization issues, we are now facing these errors:
__init_tls.c:287:21: error: '__builtin_wasm_tls_align' needs target feature bulk-memory
size_t tls_align = __builtin_wasm_tls_align();
^
__init_tls.c:288:28: error: '__builtin_wasm_tls_base' needs target feature bulk-memory
volatile void* tls_base = __builtin_wasm_tls_base();
It seems like these two functions need to be enabled by the WebAssembly target feature in clang/llvm.
The functions __builtin_wasm_tls_align
and __builtin_wasm_tls_base
are WebAssembly-specific built-in functions provided by LLVM to handle thread-local storage (TLS) in WebAssembly. These functions are part of the WebAssembly support in LLVM and Clang. After checking the Makefile of the WASI-libc implementation, I found that adding -mbulk-memory
to the CFLAGS enables that feature, which fixed the issue mentioned above.
In order to create a thread stack, glibc will use mmap to allocate the memory, and mmap is not fully support by our version of glibc , so working on that now.
WASI-libc does not support mmap
in the same way that traditional operating systems like Linux do because the WebAssembly System Interface (WASI) is designed to be a lightweight, secure, and portable interface for WebAssembly (WASM) modules. Here are the key reasons:
WASI aims to provide a secure and portable interface for running WebAssembly modules across different environments. Traditional mmap
functionality, which allows direct memory mapping, can introduce security risks and complexity. WASI focuses on maintaining a simple, consistent interface that can be safely implemented across various platforms.
WebAssembly is designed to run in a sandboxed environment, where direct access to low-level memory management features like mmap
is restricted. This sandboxing is crucial for ensuring that WebAssembly modules cannot perform unsafe operations that could compromise the host environment.
WebAssembly is designed to be platform-independent, meaning it does not rely on specific operating system features. mmap
is an OS-specific system call, and supporting it directly would require WASI to provide abstractions over many different operating systems, which goes against its design principles of simplicity and portability.
To provide similar functionality where necessary, WASI supports emulation libraries like wasi-libc
that can offer limited mmap
-like capabilities. This emulation is done to provide basic compatibility for applications that rely on mmap
, but it is not a full implementation of the mmap
system call.
As seen in the error message you encountered, WASI provides minimal mmap emulation that can be enabled by defining _WASI_EMULATED_MMAN
and linking against the wasi-emulated-mman
library. This allows for basic memory mapping operations within the constraints of the WASI environment.
WASI-libc's lack of native mmap
support is a design choice aligned with WebAssembly's goals of portability, security, and simplicity. By using emulation libraries and adhering to WASI's constraints, developers can still achieve necessary memory management functionality in a way that is consistent with WebAssembly's architecture.
After discussing with Professor Cappos and Nick, we are now attempting to use __wasi_thread_spawn
in WASI-libc instead of __clone_internal
. However, we are encountering the following issue:
Error: failed to run main module `thread.wasm`
Caused by:
0: failed to instantiate "thread.wasm"
1: unknown import: `wasi::thread-spawn` has not been defined
After migrating the WASI-libc threading code, we encountered the following error:
Error: failed to run main module `thread.wasm`
Caused by:
0: failed to instantiate "thread.wasm"
1: unknown import: `wasi::thread-spawn` has not been defined
This issue was due to the threading configuration not being enabled at runtime. By adding the --wasi threads=y
flag, we then encountered another error:
Error: unknown import: `wasi_snapshot_preview1::lind_syscall` has not been defined
After examining the source code of Wasmtime, we discovered that since threading support is experimental, the developers have separated thread
and preview
into two independent modules. Therefore, we need to enable both --wasi threads=y
and --wasi preview2=y
at runtime.
After fixing the above issue, we are now encountering the following error:
2024-07-26T16:52:15.657304Z ERROR wasmtime_wasi_threads: failed to find a wasi-threads entry point function; expected an export with name: wasi_thread_start
thread 'main' panicked at /home/dennis/Documents/lind-wasm/wasmtime/crates/wasi-threads/src/lib.rs:138:21:
thread_id = -1
I have now integrated the WebAssembly format assembly code wasi_thread_start.s
into glibc, compiled it into an object file (.o), and linked it into the sysroot. However, at runtime, we encounter the following error:
Invalid input WebAssembly code at offset 230547: global is immutable: cannot modify it with global.set
It seems to be due to the global.set __stack_pointer
and global.set __tls_base
instructions in the assembly file.
Hmm, I believe the globals are defined in the asm file. So it's weird this is happening there. Are there some compilation options were missing?
Hmm, I believe the globals are defined in the asm file. So it's weird this is happening there. Are there some compilation options were missing?
Yes, I have the same feeling. I'm checking the options that WASI-libc was using.
Previously, our implementation of lind_syscall
was located in the preview2
module in Wasmtime. However, since the thread module is experimental, developers have made the thread implementation a separate module, causing conflicts between preview2
and the thread module. The command we were using was:
/wasmtime/target/debug/wasmtime run --wasi threads=y --wasi preview2=y thread.was
After discussing with Coulson and Qianxi, we decided to port the lind_syscall
implementation into wasi_common
as well. Qianxi did a great job on this, and here is the commit: https://github.com/Lind-Project/wasmtime/commit/4663b69cd1f0fc99d86d217854c05e7be6fbc8e8. Now, we only need to enable --wasi threads=y
to activate the thread module.
Great update! Seems like this is really close.
Now we have fully implemented pthread_create
, allowing us to print 'hello world' from another thread. The issues we encountered have been resolved by:
-matomics -mbulk-memory
to wasm-config.sh
.-Wl,--shared-memory
when compiling user programs.
This approach avoids modifying or making ad-hoc changes to any previous implementations in glibc or wasi-libc.Next, Qianxi and I will continue working on the pthread_exit
issue:
Error: error while executing wasm backtrace:
0: 0x552 - <unknown>!undefined_weak:__pthread_unwind
1: 0x3c671 - __do_cancel
at /home/dennis/Documents/lind-wasm/glibc/nptl/../sysdeps/nptl/pthreadP.h:271:3
2: 0x3c671 - __pthread_exit
at /home/dennis/Documents/lind-wasm/glibc/nptl/pthread_exit.c:36:3
3: 0x38436 - __wasi_thread_start_C
at /home/dennis/Documents/lind-wasm/glibc/nptl/pthread_create.c:276:2
4: 0x381ec - <unknown>!wasi_thread_start
5: 0x47e59 - <unknown>!wasi_thread_start.command_export
Caused by:
wasm trap: wasm `unreachable` instruction executed
We will also working on the pthread_join
issue.
some update on the pthread_exit issue: we found that wasm reached "unreachable" instruction because there is no definition of "pthread_unwind" function, which is called by pthread_exit. By reviewing decompiled wasm file (wat file), we noticed that this function is marked as "undefined_weak". After some searching around, it turns out this means that this function is declared as weak function, but it does not have an associated definition. However, pthread_unwind does have a definition in unwind.c, though it is not marked as weak here. So I tried to remove the "weak" keyword from pthread_unwind declaration, and it looks like it got linked successfully. But now a new error says "_Unwind_ForcedUnwind", which is called by pthread_unwind, is not found. It turns out that this function is defined and compiled as an '.os' file, not '.o' file. But we only include '.o' file when run gen_sysroot, so I added the .os file into sysroot, and now _Unwind_ForcedUnwind is also linked into the code. The next issue is "libc_unwind_link_get function is not found". This function has two versions depends on whether "SHARED" macro is defined globally. I first tried to recompile unwind-link.c (which contains the file) with SHARED flag set, and it is able to compile. However, when trying to run the thread code, it turns out that libc_unwind_link_get will call libc_malloc (instead of the malloc used by threading, which is defined under malloc directory). The libc_malloc internally used mmap, so it will fail with "out of bound memory access". I also tried to use non-shared version of __libc_unwind_link_get function, in which case this will just return a constant. In this case, the code is able to compile and run without any error. However when I looked into the wat file, I found out that the thread is still not exited successfully, it is instead entering an infinite loop that never return when calling pthread_exit.
So looks like currently we have two ways to try, the first way is try to fix the libc_unwind_link_get with correct malloc implementation. Or we can try what wasi-libc does for __pthread_exit (their pthread_exit implementation looks significantly different from ours currently).
We're now facing a concurrency issue: when the main thread exits first, it finalizes the cage, leaving the child thread without a cage to use. This leads to the following issue:
spawned thread id = 1; calling start function `wasi_thread_start` with: 197024
thread '<unnamed>' panicked at src/safeposix/dispatcher.rs:239:22:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: Rust cannot catch foreign exceptions
[2] 977993 IOT instruction (core dumped) ../../wasmtime/target/debug/wasmtime run --wasi threads=y thread.wasm
@rennergade Any thoughts on this issue?
It seems we finally need to patch in futex(). You should experiment with my branch here: https://github.com/Lind-Project/lind-wasm/issues/1
Now, we are able to make pthread_join working with futex. We faced two issues along the way. The first issue is that the pthread struct in main thread is not shared with the pthread struct in the child thread. pthread_join requires pthread->tid to work, so if this is not shared, pthread_join will not work. We fix this issue by passing pthread struct into the start_args, so the child thread could use the same pthread struct as parent. Another minor issue is about futex implementation in rustposix. The first argument of futex_syscall is supposed to be the an address, but we are using u32 to hold the value, which would overflow in 64 bit machine. We fixed it by changing the argument type to u64. So pthread_create, pthread_join and pthread_exit are all working right now, and I guess we are basically done for the pthread for now? (though pthread_exit will not be able to expose to user space, since looks like exiting a thread half way in not possible in wasmtime currently (at least without modifying wasmtime implementation), as the only way to make a thread legitimately exits is letting wasi_thread_start function returns).
WASI-libc initializes TLS with this function: WASI-libc TLS Initialization. Currently, we are failing because
dl_tls_static_size
anddl_tls_static_align
in glibc-nptl are never initialized. It is assumed that other functions dependent on this TLS initialization will also fail. We need to determine if our libc is currently calling any function to initialize TLS, and if not, why this is the case. Subsequently, we need to find a way to integrate the WASI-libc version of TLS initialization with our libc.