Open loganek opened 1 month ago
@wenyongh please feel free to jump in. I hope I've wrote down everything we've discussed.
From what we know, we are hoping this feature will cover:
dlxxx()
linking at all--native-lib
and wasm(aot)implement of fully-runtime dynamic linking between core modules
the linked doc seems talking about components, not core modules. do you mean an equivalent for core modules?
do you mean https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md ? or something else?
keep current loading-phase module level linking capability. Might mimic that behavior
a question is what should be kept in the base library and what should be moved inot separate layer(s).
handle linking between
do you mean every possible combinations? or have you left out some combinations from the matrix? terms used here like module/aot/wasm/embedded etc are a bit unclear to me.
do you mean an equivalent for core modules
It's more like what "component-ld" doing. Like creating extra core instance to provide the memory(heap).
After linking, core instance A, B, C will have their own table and global, and shared stack and heap. Something like:
It might require --shared
when compiling to Wasm modules.
a question is what should be kept in the base library and what should be moved into separate layer(s).
Move into separate layers. Keep base library minimum.
do you mean every possible combinations
Yes. need to handle all below five combinations. Sorry for terms. Let me try to explain:
module and aot mean .wasm files and .aot files. This part is clear, right?
"host and wasm" means to integrate wamr as libraries in a big project. Wasm imports are provided by core/iwasm/libraries and host itself. Sometimes, host will even provided their own support for wasi-libc. There are some APIs in both wasm_export.h and wasm_c_api.h can be used for that purpose.
"--native-lib" means the command line option of iwasm
. It requires users provide a .so with pre-defined symbols, init_native_lib()
, get_native_lib()
and deinit_native_lib()
.
Built-in libraries are runtime provided host functions. They are stored @ core/iwasm/libraries.
are you ( @loganek and @lum1n0us ) talking about the same specific design? or different possible designs for the similar goal?
a question is what should be kept in the base library and what should be moved into separate layer(s). Move into separate layers. Keep base library minimum.
a question is how the minimum base library would look like. i guess it should contain:
plus, some extra functionality necessary to support load-time linking
But also instantiation-time linking. Actually, we are looking for a way to mimic the load-time linking with instantion-time linking.
plus, some extra functionality necessary to support load-time linking
But also instantiation-time linking. Actually, we are looking for a way to mimic the load-time linking with instantion-time linking.
can you explain a bit? i was considering instantiate-time linking just a variation of ordinary import/export as per wasm spec.
Yes. The main goal is to implement instantiate-time linking as wasm spec which might include modification like(not all of them, need to add later):
The core algorithm will be instantiate-time linking instead of load-time linking. But, for compatibility, we need to keep loading-time linking APIs. It will mean all instances of the module can share the same imports set in the new arch. The loading time check(for import modules) might still be necessary in some scenarios.
@wenyongh please feel free to jump in. I hope I've wrote down everything we've discussed.
From what we know, we are hoping this feature will cover:
- a separate layer above wamr libraries to handle variant linking mechanism
- implement of fully-runtime dynamic linking between core modules
- re-factory current multi-module feature. Move code to a separate layer.
- may even leave space for shared-nothing linking
- might not consider
dlxxx()
linking at all- keep current loading-phase module level linking capability. Might mimic that behavior
handle linking between
- module and module.
- aot and aot.
- host and wasm(aot) (embedded)
--native-lib
and wasm(aot)- built-in libraries(core/iwasm/libraries/) and wasm(aot) (perhaps be covered by previous)
@lum1n0us, @yamt sorry for the late response, from what we are discussing internally and what we discussed with @no1wudi and @loganek , I think we want to implement https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md but not https://github.com/WebAssembly/component-model/blob/main/design/mvp/Linking.md#fully-runtime-dynamic-linking. Here are some of my understandings (just personal understanding, maybe incorrect):
There are two types of module: (1) one is called Main module and is compiled normally like what we do now; (2) the other is called Side module, we can use command like clang -fPIC -shared -Wl,--export=<func1>;<func2>
to build out the module, or emcc -s SIDE_MODULE=1 -s EXPORTED_FUNCTIONS=“[func1;func2]”
. Note that the concept is firstly introduced by Emscripten, and is supported by emsdk, clang and wasi-sdk now.
A project should contain exactly one main module and can contain one or multiple side modules
Only main module has linear memory and wasm table, and side module imports memory/table of main module through memory of import env.memory
and table of import env.__indirect_function_table
, and also side module imports two globals import env.__memory_base
and import env.__table_base
: runtime should assign a region inside main module's linear memory for the first global, then put the side module's initial data in the region, and in all the load/store opcodes that access the global data, the address should have been added with the global before accessing. And similar for the second global.
Maybe we can allocate a buffer in the libc heap of host managed heap for the side module.
I think we want to implement https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md
i'm reasonably familiar with the convention as i have implemented it for other runtime. https://github.com/yamt/toywasm/tree/master/libdyld
i have even considered to port it to wamr. the main obstacle was wamr's lack of spec-conforming import/export semantics. eg. https://github.com/bytecodealliance/wasm-micro-runtime/issues/1353 i guess it will be a problem for component-model as well.
3. Only main module has linear memory and wasm table, and side module imports memory/table of main module through memory of `import env.memory` and table of `import env.__indirect_function_table`, and also side module imports two globals `import env.__memory_base` and `import env.__table_base`: runtime should assign a region inside main module's linear memory for the first global, then put the side module's initial data in the region,
non-pie executable works like that.
but for -pie
executable, which iirc emscripten produces, linear memory etc are provided by the runtime linker.
https://github.com/yamt/garbage/tree/master/c/shlib has an example for both cases.
and in all the load/store opcodes that access the global data, the address should have been added with the global before accessing. And similar for the second global.
if you mean to make the runtime do something special when executing eg. i32.load, it doesn't work that way. it's something llvm takes care of. the runtime just needs to execute simple relocations.
i'm reasonably familiar with the convention as i have implemented it for other runtime. https://github.com/yamt/toywasm/tree/master/libdyld
i have even considered to port it to wamr. the main obstacle was wamr's lack of spec-conforming import/export semantics. eg. #1353 i guess it will be a problem for component-model as well.
Yes, your project is cool. Do you mean it is an obstacle if we use WAMR current multi-module's import/export semantics? I am not very sure what is the best way, maybe we can implement a new linking semantics and enable core module dynamic linking based on it, and also refactor multi-module to use it.
3. Only main module has linear memory and wasm table, and side module imports memory/table of main module through memory of `import env.memory` and table of `import env.__indirect_function_table`, and also side module imports two globals `import env.__memory_base` and `import env.__table_base`: runtime should assign a region inside main module's linear memory for the first global, then put the side module's initial data in the region,
non-pie executable works like that. but for
-pie
executable, which iirc emscripten produces, linear memory etc are provided by the runtime linker. https://github.com/yamt/garbage/tree/master/c/shlib has an example for both cases.
Thanks for the correction. I am not sure whether there is requirement for pie executable, since normally we can compile wasm to AOT, and AOT XIP is enabled already.
and in all the load/store opcodes that access the global data, the address should have been added with the global before accessing. And similar for the second global.
if you mean to make the runtime do something special when executing eg. i32.load, it doesn't work that way. it's something llvm takes care of. the runtime just needs to execute simple relocations.
Yes, maybe runtime just needs to do few things. I meant that the address of i32.load
should have been relocated, e.g., there may be another instruction adding env.__memory_base
to the address before i32.load
, we don't need to add env.__memory_base
manually when executing the instruction. Runtime just executes i32.load
as normal.
Do you mean it is an obstacle if we use WAMR current multi-module's import/export semantics?
yes. the dynamic-linking convention heavily relies on import/export of instance resources. (memory, table, globals, functions) while it might be possible to tweak it to our multi-module semantics (i dunno), i guess it's more productive to work on supporting spec-conforming import/export semantics.
I am not very sure what is the best way, maybe we can implement a new linking semantics and enable core module dynamic linking based on it, and also refactor multi-module to use it.
yes, it makes sense to design a base machinery (maybe spec-conforming import/export + (hopefully) small extra functionality) which can be used as a base of both semantics.
Are there any users relying on the multi-module support today that can not easily port to dynamic linking (i.e. recompile their code while upgrading WAMR)? Given that mult-module is very WAMR specific, and there's already a documented approach in https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md , I wonder if we could drop (or at least deprecate) multi-module support and advise users to use dynamic linking instead?
Maybe we can allocate a buffer in the libc heap of host managed heap for the side module.
Not sure what do you mean by the host managed heap
? I was initially thinking we could just export malloc
from the module and call it from WAMR when loading modules, but there are few issues:
malloc
in all the modules, as it likely won't be included in the main module (I presume in many cases libc will be side-loaded) - that's not a big deal though, but:Alternatively, could we perhaps rely on the __heap_base
/ __stack_high
(or __data_end
) global and update it when the modules are being loaded?
Also, I don't think there's a per-module aux stack; the aux stack IIUC should be shared among the all the modules, and the __stack_pointer
global should be imported from the main module, and be used for all the modules. That poses a small challenge because __stack_pointer
might not be available in the module by name; we could use a heuristic that WAMR already implements for checking aux_stack boundaries (assume it's the first non-imported global) but I don't know how reliable it is. I think it's reasonable to assume the global needs to be exported from the main module (although the doc https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md doesn't say anything about it).
Here's a diagram based on @wenyongh 's one with slight modifications described above.
Are there any users relying on the multi-module support today that can not easily port to dynamic linking (i.e. recompile their code while upgrading WAMR)? Given that mult-module is very WAMR specific, and there's already a documented approach in https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md , I wonder if we could drop (or at least deprecate) multi-module support and advise users to use dynamic linking instead?
I guess there are some developers using multi-module feature since some related PRs were submitted, we had better keep it:
Not sure what do you mean by the
host managed heap
? I was initially thinking we could just exportmalloc
from the module and call it from WAMR when loading modules, but there are few issues:The host managed heap means the heap inserted by runtime, when the wasm module is compiled with
-nostdlib
and libc isn't linked, the developer can insert the host managed heap into the linear memory bywasm_runtime_instantiate(.., heap_size = n)
: https://bytecodealliance.github.io/wamr.dev/blog/understand-the-wamr-heaps/ Note that the host native can also callmodule_malloc
function to allocate memory from this heap, just like from libc heap when libc is linked into wasm module and malloc function is exported.
I think it may be difficult if we don't call the malloc function exposed by wasm module to assign a region for the side module: per my understanding in main module, the global data, the aux stack and the libc heap (or host-managed heap) are contiguous, you had to allocate a region from global data or aux stack - the global data should be being used and unavailable, and for aux stack, will you decrease the stack pointer each time to yield/leave a region? And if to increase global __heap_base
, per my understanding, after libc heap is initialized, it should not be changed again. And it even shouldn't be changed at the beginning since in our investigation, its const value is hardcoded in wasm bytecode: normally we only changes it when libc heap isn't enabled.
- we'd need to search for
malloc
in all the modules, as it likely won't be included in the main module (I presume in many cases libc will be side-loaded) - that's not a big deal though, but:- we can't call malloc until the module is instantiated...
IIUC, the linking happens in instantiation time, or at least assigning a region and setting the import globals happen in the instantiation time, so we can call malloc function then. We can specify that for the core module dynamic linking feature, developer must export malloc/free function when compiling wasm module.
Alternatively, could we perhaps rely on the
__heap_base
/__stack_high
(or__data_end
) global and update it when the modules are being loaded?
As mentioned above, it may be difficult to update them. And it is also not flexible, how to manage the assigned regions or should there be a region manager? Suppose that region1 and region2 are assigned, and how to handle if region1 is freed before region2?
Also, I don't think there's a per-module aux stack; the aux stack IIUC should be shared among the all the modules, and the
__stack_pointer
global should be imported from the main module, and be used for all the modules. That poses a small challenge because__stack_pointer
might not be available in the module by name; we could use a heuristic that WAMR already implements for checking aux_stack boundaries (assume it's the first non-imported global) but I don't know how reliable it is. I think it's reasonable to assume the global needs to be exported from the main module (although the doc https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md doesn't say anything about it).
Thanks, I didn't think it carefully, maybe you are right. And there is more complex situation - both multi-thread(or lib pthread) and core module dynamic linking are enabled.
It seems feasible to manage the entire linking process through multiple layers.
The foundational layer would be responsible for implementing the import and export specifications. This includes the fundamental mechanisms for importing and exporting functions, globals, memories, and tables. It also encompasses the linking that occurs during the instantiation phase. Each instance should have a distinct set of imports, which could be sourced from other instances, the host environment, or runtime libraries in core/iwasm/libraries.
The upper layer would build upon the APIs provided by the lower layer, along with additional runtime APIs found in wasm_export.h. This layer would facilitate features like multi-module support and DynamicLinking. It would also have the capability to access the contents of the dylink.o section.
Given that dynamic linking requires recompilation with specific flags (such as -fPIC --shared), we need to maintain a method for linking standard WebAssembly core modules that do not have the dylink.o section. Users of WAMR libraries might choose between APIs for the instantiation phase linking and the loading phase linking(the multi-module feature). If all WebAssembly modules are guaranteed to be recompiled, users can opt for dynamic linking.
The term "fully-runtime-dynamic-linking" likely originates from the tool wasm-component-ld. Although this tool outputs a component module, it addresses a significant portion of the linking requirements. It processes multiple core modules that contain the dylink.0 section, determines the correct order for instantiation, and connects one instance's imports to another instance's exports. It can even instantiate an instance without a related module to ensure all dependencies are met.
I strongly suggest looking into the approach that wasm-component-ld uses to achieve what is known as "shared-everything-linking." By following this method, we can adhere to a "standard" process that is recognized and accepted by both the WebAssembly community and the official specification. This approach should prevent any unexpected behavior.
The host managed heap means the heap inserted by runtime, when the wasm module is compiled with -nostdlib and libc isn't linked
Ah ok, yes I'm familiar with the app heap concept, just didn't know that you're talking about the same thing - thanks for clarifying. However, I think using app heap is not a viable approach for various usecases (especially, it's not for my team).
IIUC, the linking happens in instantiation time, or at least assigning a region and setting the import globals happen in the instantiation time, so we can call malloc function then.
My concern is that we'll call a function from a module (malloc
) that's not fully instantiated at that stage (because the data is not loaded), which might lead to unexpected behavior. Whereas this might work with malloc implemented in wasi libc (although I didn't check), I can think of examples where this will fail.
And if to increase global __heap_base, per my understanding, after libc heap is initialized, it should not be changed again. And it even shouldn't be changed at the beginning since in our investigation, its const value is hardcoded in wasm bytecode: normally we only changes it when libc heap isn't enabled.
Yes, I totally agree that __heap_base
shouldn't be modified after libc heap is initialized. And you're right that the __heap_base
is a constant value hardcoded in the WASM file itself, but I was thinking that perhaps WAMR could modify its value before instantiation as soon as it loads all the modules and knows the size of the data for each module. What do you think?
Ah ok, yes I'm familiar with the app heap concept, just didn't know that you're talking about the same thing - thanks for clarifying. However, I think using app heap is not a viable approach for various usecases (especially, it's not for my team).
App heap is only enabled when libc heap's malloc/free functions are not exported and developer passes heap_size larger than 0 to wasm_runtime_instantiate, it is an alternative to lib heap. I think if your cases don't need it, you can just disable it by exporting malloc/free functions of wasm module.
My concern is that we'll call a function from a module (
malloc
) that's not fully instantiated at that stage (because the data is not loaded), which might lead to unexpected behavior. Whereas this might work with malloc implemented in wasi libc (although I didn't check), I can think of examples where this will fail.
Do you mean at this time the side module which is to be loaded or linked hasn't been instantiated yet? But here it is to malloc the memory from the main module, maybe we can instantiate the main module first to ensure that we can call its malloc function, and lazily link its sub modules, e.g., only link the module when runtime calls a unlinked function?
Yes, I totally agree that
__heap_base
shouldn't be modified after libc heap is initialized. And you're right that the__heap_base
is a constant value hardcoded in the WASM file itself, but I was thinking that perhaps WAMR could modify its value before instantiation as soon as it loads all the modules and knows the size of the data for each module. What do you think?
In my memory, modifying __heap_base
global will make lib heap malloc throw exception, had better not do that. Another approach that I can think is to assign a lot of regions from aux stack, but this had better be an eager linking: during instantiating the main module, let runtime recursively retrieve all the side modules it (and its children modules) depends on, and then allocate regions one by one for these side modules from aux stack, and then reduce the __stack_pointer
global just one time. And developer can set a larger aux stack by using -z stack-size=n
for wasi-sdk when compiling wasm module. Do you think this is a better way than calling malloc function and modifying __heap_base
?
Ah ok, yes I'm familiar with the app heap concept, just didn't know that you're talking about the same thing - thanks for clarifying. However, I think using app heap is not a viable approach for various usecases (especially, it's not for my team).
App heap is only enabled when libc heap's malloc/free functions are not exported and developer passes heap_size larger than 0 to wasm_runtime_instantiate, it is an alternative to lib heap. I think if your cases don't need it, you can just disable it by exporting malloc/free functions of wasm module.
Yes, my point was that relying on the app heap as a solution might not work for some of the usecases, so we probably should find another solution.
My concern is that we'll call a function from a module (
malloc
) that's not fully instantiated at that stage (because the data is not loaded), which might lead to unexpected behavior. Whereas this might work with malloc implemented in wasi libc (although I didn't check), I can think of examples where this will fail.Do you mean at this time the side module which is to be loaded or linked hasn't been instantiated yet? But here it is to malloc the memory from the main module, maybe we can instantiate the main module first to ensure that we can call its malloc function, and lazily link its sub modules, e.g., only link the module when runtime calls a unlinked function?
So first of all, I don't think that malloc
will always be placed in the main module. For example, if users compile with wasi libc, the malloc
likely will be in the libc.so
, not in their main module. However, I'm not worried about that - we could load recursively all the dependencies so the module with malloc()
is also loaded at that time. My main concern that I have is that some of the malloc()
implementations might rely on the fact that the data is already available (for whatever reason), so the execution will crash. Here's an extreme version of the malloc implementation that would fail:
void *malloc(size_t size)
{
puts("allocating...");
return 0x00000;
}
I don't expect any malloc implementation to do that, but this is just to illustrate the risk. We could make assumption that the malloc
function must not access the rodata, but that's probably too restrictive.
Yes, I totally agree that
__heap_base
shouldn't be modified after libc heap is initialized. And you're right that the__heap_base
is a constant value hardcoded in the WASM file itself, but I was thinking that perhaps WAMR could modify its value before instantiation as soon as it loads all the modules and knows the size of the data for each module. What do you think?In my memory, modifying
__heap_base
global will make lib heap malloc throw exception, had better not do that. Another approach that I can think is to assign a lot of regions from aux stack, but this had better be an eager linking: during instantiating the main module, let runtime recursively retrieve all the side modules it (and its children modules) depends on, and then allocate regions one by one for these side modules from aux stack, and then reduce the__stack_pointer
global just one time. And developer can set a larger aux stack by using-z stack-size=n
for wasi-sdk when compiling wasm module. Do you think this is a better way than calling malloc function and modifying__heap_base
?
Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base
global; instead, it puts constant values in the code; for example this:
#include <stdio.h>
extern unsigned char __heap_base;
int main() {
printf("Hello %u\n", (unsigned)&__heap_base);
return 0;
}
would not result with global.get __heap_base
, but instead, it will be generate a i32.const XXX
instruction (where XXX is a heap base value). So the idea with updating the global value won't work in case the malloc
is implemented in the main module (as it likely will use the __heap_base
value).
Regarding using the stack size - I think that could work, but one challenge I see for user is to figure out the right stack-size
value. I think it's easy to estimate it when we compile all the modules at once; however, we have use case where we want to dynamically update some of the modules, without updating the main module. So if the data size in the submodule significantly changes, it might no longer fit into the stack size, which means the main module would have to be updated too. Having said that, I think it's more likely that the stack requirements for the submodules will change more often than the data requirements which means the main module will have to be updated anyway. So that might not be such a big of a deal.
I think both malloc
and using stack is a working idea, but they both have their limitations (for stack size, it's the need for updating main module when dependent modules change, and for malloc
it's the possibility of using rodata). I know that the stack size limitation is a bit of a blocker for my team; I'm not sure though how valid my concern about the malloc
is - if I'm overly worried about this, please let me know, but I have a feeling that it might hit us at some point of time.
I'm going to think a bit more about this problem and share an update tomorrow, but feel free to share your thoughts too.
I've also started a branch with some initial experiments here so feel free to subscribe for updates: https://github.com/bytecodealliance/wasm-micro-runtime/compare/main...loganek:wasm-micro-runtime:loganek/dynamic-linking?expand=1 This also includes a (for now living) document where I keep track on the discussions and some of my thoughts, and I hope to get that reviewed at some point of time with the community once the most important details are fleshed out: https://github.com/loganek/wasm-micro-runtime/blob/loganek/dynamic-linking/doc/dynamic_linking_design.md
@loganek Thanks for the explanation and the experiment. Agree to investigate more and make a good decision. For the malloc function, if it isn't exported by wasm module and libc heap isn't linked, the host managed heap can be inserted into the linear memory, and the host native can also call wasm_runtime_module_malloc
to allocate memory from it. But it really increase risks: (1) the main module should have been instantiated before runtime calls module_malloc, if main module relies on some other modules, then runtime may need to lazily load these modules, (2) calling malloc of libc heap may trigger memory.grow
opcode to enlarge the memory, which may require update other modules' import memory info.
Another idea I got is to reduce __stack_pointer
global to reserve a space from aux stack or increase __heap_base
global to reserve a space from libc heap, and then initialize this space into another host managed heap, so runtime can allocate a region from it to the side module if needed: (1) the allocation just calls the APIs of the runtime's memory allocator, it doesn't call into wasm bytecode, (2) it is flexible than allocating regions one by one by changing __stack_pointer
/__heap_base
many times, (3) maybe we can add option for the developer to figure out the size of this space at the beginning.
If using the space of aux stack is an issue, then we had better try to resolve the hardcoded const issue in wasi-sdk, and be able to update __heap_base
.
However, I think using app heap is not a viable approach for various use-cases (especially, it's not for my team)
I'm concerned that we may have delved too deeply into specifics without first clarifying our requirements and understanding the broader context. 'Dynamic Linking' is merely one of many technologies that can be used to achieve our goal of core module linking. I'm not sure why we've become fixated on this particular method or why we believe it to be the ultimate solution for our needs. (Although I agree this is the best solution we have so far)
Upon review, I believe there are several preliminary questions that need to be addressed, and I suspect there may be more:
Please allow me to contribute to the detailed discussion.
Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:
For a wasm32-wasi module, __heap_base
and __heap_end
are symbols provided by wasm-ld and used by dlmalloc() to define the boundaries of the heap. If I understand correctly, it's best not to interfere with these symbols and to let wasm-ld and wasi-libc handle their responsibilities.
Looking at the contents of the dylink.0 and import sections, a module compiled with -shared
typically depends on libc.so and requires malloc()
and realloc()
functions. I'm not certain if these can be omitted when unused. It seems that the original concept for memory management in wasi-libc was to link dlmalloc() within /opt/wasi-sdk-22.0/share/wasi-sysroot/lib/wasm32-wasi/libc.so.
The term "fully-runtime-dynamic-linking" likely originates from the tool wasm-component-ld. Although this tool outputs a component module, it addresses a significant portion of the linking requirements. It processes multiple core modules that contain the dylink.0 section, determines the correct order for instantiation, and connects one instance's imports to another instance's exports. It can even instantiate an instance without a related module to ensure all dependencies are met.
I strongly suggest looking into the approach that wasm-component-ld uses to achieve what is known as "shared-everything-linking." By following this method, we can adhere to a "standard" process that is recognized and accepted by both the WebAssembly community and the official specification. This approach should prevent any unexpected behavior.
are you sure wasm-component-ld processes dylink.0 section?
don't you mean wasm-tools component link
?
Please allow me to contribute to the detailed discussion.
Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:
For a wasm32-wasi module,
__heap_base
and__heap_end
are symbols provided by wasm-ld and used by dlmalloc() to define the boundaries of the heap. If I understand correctly, it's best not to interfere with these symbols and to let wasm-ld and wasi-libc handle their responsibilities.Looking at the contents of the dylink.0 and import sections, a module compiled with
-shared
typically depends on libc.so and requiresmalloc()
andrealloc()
functions. I'm not certain if these can be omitted when unused. It seems that the original concept for memory management in wasi-libc was to link dlmalloc() within /opt/wasi-sdk-22.0/share/wasi-sysroot/lib/wasm32-wasi/libc.so.
dynamic-linking itself doesn't require wasi or malloc.
And there is more complex situation - both multi-thread(or lib pthread) and core module dynamic linking are enabled.
basically dynamic-linking is incompatible with threads. i guess it's worth to research what emscripten does in that regard.
Maybe we can allocate a buffer in the libc heap of host managed heap for the side module.
what allocation are you talking about? in dynamic-linking, the runtime linker allocates memory regions for shared libraries by growing the linear memory.
* Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.
dynamic-linking itself doesn't requrie wasi as far as i know.
The term "fully-runtime-dynamic-linking" likely originates from the tool wasm-component-ld. Although this tool outputs a component module, it addresses a significant portion of the linking requirements. It processes multiple core modules that contain the dylink.0 section, determines the correct order for instantiation, and connects one instance's imports to another instance's exports. It can even instantiate an instance without a related module to ensure all dependencies are met. I strongly suggest looking into the approach that wasm-component-ld uses to achieve what is known as "shared-everything-linking." By following this method, we can adhere to a "standard" process that is recognized and accepted by both the WebAssembly community and the official specification. This approach should prevent any unexpected behavior.
are you sure wasm-component-ld processes dylink.0 section? don't you mean
wasm-tools component link
?
Given that the --target=wasm32-wasip2 option will merge multiple .a files to create a component model, it's possible that wasm-component-ld might share similar capabilities with wasm-tools component link
. I should use both as examples
Please allow me to contribute to the detailed discussion.
Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:
For a wasm32-wasi module,
__heap_base
and__heap_end
are symbols provided by wasm-ld and used by dlmalloc() to define the boundaries of the heap. If I understand correctly, it's best not to interfere with these symbols and to let wasm-ld and wasi-libc handle their responsibilities. Looking at the contents of the dylink.0 and import sections, a module compiled with-shared
typically depends on libc.so and requiresmalloc()
andrealloc()
functions. I'm not certain if these can be omitted when unused. It seems that the original concept for memory management in wasi-libc was to link dlmalloc() within /opt/wasi-sdk-22.0/share/wasi-sysroot/lib/wasm32-wasi/libc.so.dynamic-linking itself doesn't require wasi or malloc.
At first glance, no. However, it might be worth considering when determining how to manage the heap area
* Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.
dynamic-linking itself doesn't requrie wasi as far as i know.
If we use wasm-component-ld
and wasm-tools component link
as points of reference, the code from wasi-libc becomes relevant. Additionally, with the assistance of wasi-libc code, the linking requirements for wasm32-wasi modules and wasm32-unknown modules emerge as two distinct challenges. Furthermore, when using wasi-sdk toolchains, there will always be a libc.so present in the name subsection of the dylink.0 section. We must either align it with the libc.so in the wasi-sdk or create a separate version while still maintaining the potential compatibility requirements.
* Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.
dynamic-linking itself doesn't requrie wasi as far as i know.
If we use
wasm-component-ld
andwasm-tools component link
as points of reference, the code from wasi-libc becomes relevant. Additionally, with the assistance of wasi-libc code, the linking requirements for wasm32-wasi modules and wasm32-unknown modules emerge as two distinct challenges. Furthermore, when using wasi-sdk toolchains, there will always be a libc.so present in the name subsection of the dylink.0 section. We must either align it with the libc.so in the wasi-sdk or create a separate version while still maintaining the potential compatibility requirements.
i'm not sure what's your point. if you use wasi-sdk, it requires wasi of course.
Thanks a lot for the discussion so far, let me answer some of the questions and concerns here
If using the space of aux stack is an issue, then we had better try to resolve the hardcoded const issue in wasi-sdk, and be able to update __heap_base.
I think "fixing" __heap_base
might not be possible and can lead to various issues. __heap_base
is an immutable global, which means that even though the linker produces the correct code, various optimizers (e.g. wasm-opt or others) might choose to optimize the global.get
calls and replace them with constants. Probably the way to avoid is would be to make the __heap_base
a mutable global, but that doesn't seem to be right. I think there are two solutions that seem viable to me right now:
Placing the data at the beginning of the stack:
__stack_pointer
value to some runtime variable (let's call it mb
env.__memory_base
to be mb
, update mb
by adding the size of the data of the module__stack_pointer
global to be the mb
valuePlacing the data at the end of the stack. If we do that, there's no need to make any changes for the stack pointer; we might need to update some variables internally in WAMR when the stack overflow detection is enabled, but that shouldn't be a problem. edit link
I'd really like to just update __heap_base
, but as mentioned above, that might not be possible at all without making it mutable. So we can probably stick to 1 or 2 (my preferred one is 2 as we don't need to manipulate the global, but happy to discuss this further).
I'm concerned that we may have delved too deeply into specifics without first clarifying our requirements and understanding the broader context. 'Dynamic Linking' is merely one of many technologies that can be used to achieve our goal of core module linking. I'm not sure why we've become fixated on this particular method or why we believe it to be the ultimate solution for our needs. (Although I agree this is the best solution we have so far)
I totally understand the concern. I think the selling points for this one are:
Also, the RFC itself is to implement the Dynamic Linking spec, and by doing that, solve all the problems that this spec solves. I do understand there might be other problems that Dynamic Linking doesn't solve, but that quite likely would require a toolchain support, so the discussion for that should likely happen with a broader community, not just among WAMR users/developers. The problem my team has is the lack of ability to split modules into smaller chunks, caching some of them, re-using some of the exports across different modules, and Dynamic Linking spec solves those problems for us. If there are more requirements, I'd be happy to discuss them too and see if that's something we should include in Dynamic Linking spec or can this be handled internally by WAMR.
Regarding the toolchain: Which guest language toolchains should we take into account? This is important because we need to ensure that all potential guest languages can support specific compilation options, such as --shared, which is a Clang option. Toolchains for C-like languages, including C, C++, and even Rust, can handle this. But what about Go/TinyGo and TypeScript? These languages are used by our customers who have expressed the need for linking this time.
From my perspective wasi-sdk-based toolchains (i.e. C/C++/Rust) are top priority, although I understand other teams might have different requirements, so I'd like to learn about that too. Overall, I'd love to make the feature toolchain-agnostic, but I think it'd be difficult to provide a support for all possible toolchains without making some assumptions or putting some requirements in place. For example, for memory, we could assume that the main module should at least export two globals:
data_region_start
which is a global defining where the runtime is expected to place the data from sub modulesdata_region_end
which is a mutable global where runtime should update the value to point to the end of region (that might not be needed if we put the data at the top of the stackConcerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.
I'm not sure if that matters a lot in this case, but even if it did, we should cover as many targets as possible (in this case though, I think the solutions discussed in this thread will work for both wasm32-wasi and wasm32-unknown)
If a linking solution, such as 'Dynamic Linking', is limited to recompiling wasm modules, do we continue to support the linking of multiple standard wasm modules? This might be affirmative due to the multi-module feature.
According to PR links posted by @wenyongh it looks like there are customers using it, so we can't simply drop it. However, my proposal is to "keep it but not touch it" and eventually deprecate / remove (we'd need to discuss the timeline) - but I'd like to hear from the existing users. I think making dynamic linking compatible with the existing multi-module might just be quite a bit of effort, so I suggest we build dynamic linking as a separate library, and only re-use some code when it's straightforward. Multi-module is a non-standard extension, whereas the https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md is something that was somehow agreed by the community and even though is not a standard, some tools already are compatible with it. I don't mind keeping multi-module around, but if users can migrate to dynamic linking, deleting the multi-module support would reduce the maintenance overhead.
basically dynamic-linking is incompatible with threads. i guess it's worth to research what emscripten does in that regard.
yes agree, this requires a bit deeper investigation.
If using the space of aux stack is an issue, then we had better try to resolve the hardcoded const issue in wasi-sdk, and be able to update __heap_base.
I think "fixing"
__heap_base
might not be possible and can lead to various issues.__heap_base
is an immutable global, which means that even though the linker produces the correct code, various optimizers (e.g. wasm-opt or others) might choose to optimize theglobal.get
calls and replace them with constants. Probably the way to avoid is would be to make the__heap_base
a mutable global, but that doesn't seem to be right.
i'm not following this discussion about the heap.
it's trivial for a runtime linker to adjust __heap_base
of pie executables and shared libraries. actually it's how dynamic-linking works.
i agree it's impossible for a runtime to adjust __heap_base
of a statically linked module. cf. https://github.com/bytecodealliance/wasm-micro-runtime/issues/2275
but is it related to this dynamic-linking RFC?
it's trivial for a runtime linker to adjust __heap_base of pie executables and shared libraries. actually it's how dynamic-linking works.
Yes, if we assume the executable (main module) is built with -Wl,-pie
flags (or -Wl,-pie -fPIC
flags when the main module exports functions then indeed the idea I've had with updating __heap_base
by the runtime is going to work, because the __heap_base
in this case is the imported global even for main module and submodules. I was confused though because I didn't think -pie
is a requirement for the WASM dynamic linking - if that's the case though, I'd be happy to move forward with the __heap_base
approach.
To avoid adjusting __heap_base at runtime, I'm considering using wasm-tools component link as a reference. While it's not the definitive solution, it serves as a solid example.
wasm-tools accepts multiple core modules (compiled with --shared
) as input, links them together, and produces a component module. Within the component module, wasm-tools generates core instance
opcodes to ensure the correct order of instantiation, using exports from previous instances to satisfy imports for subsequent ones. It also creates an initial module instance that contains a custom linear memory instance and associated global values (such as memory_base, __table_base, stack_pointer, etc.).
Here's an example of how to create a component model from multiple wasm32-wasi modules.
Key observations include:
-fPIC --shared
options, resulting in all modules being side modules. This allows the linker or runtime to customize the linear memory layout and use exports/imports to position all module instances.To avoid adjusting __heap_base at runtime, I'm considering using wasm-tools component link as a reference. While it's not the definitive solution, it serves as a solid example.
From what I see this is very similar to what was discussed above, the difference is, as you pointed out, that there's no main module, but instead, the linked module is constructed out of multiple shared modules. Because none of them is the final executable, there's indeed no __heap_base
. So what happens there is that the __heap_base
is calculated based on the stack size and the size of all the data segment from all of the modules. So instead of updating __heap_base
as we've discussed above, that linker just creates a new one (because there's none yet).
My usecase is likely going to be a single main module and a number of submodules. However, I don't think the other scenario (where there is no main module, let's call it lib-only
) can't be supported in WAMR. I also think that a lot of the implementation that satisfies my use case can be re-used to implement lib-only
usecase. I'll definitely keep that use case in mind during the design and make sure the implementation can easily be extended.
My usecase is likely going to be a single main module and a number of submodules.
I completely respect your decision. Please be aware that, in some respects, a main module = a submodule + libc.so
. And given that modules compiled with --nostdlib
can be considered as submodules, opting for a 'lib-only' approach could significantly reduce the effort involved.
it's trivial for a runtime linker to adjust __heap_base of pie executables and shared libraries. actually it's how dynamic-linking works.
Yes, if we assume the executable (main module) is built with
-Wl,-pie
flags (or-Wl,-pie -fPIC
flags when the main module exports functions then indeed the idea I've had with updating__heap_base
by the runtime is going to work, because the__heap_base
in this case is the imported global even for main module and submodules. I was confused though because I didn't think-pie
is a requirement for the WASM dynamic linking - if that's the case though, I'd be happy to move forward with the__heap_base
approach.
as pie executable is what emscripten uses, i guess it's the de-facto. even with non-pie executables, the linker can trivially allocate extra memory regions. (for app heap or something)
To avoid adjusting __heap_base at runtime, I'm considering using wasm-tools component link as a reference. While it's not the definitive solution, it serves as a solid example.
wasm-tools accepts multiple core modules (compiled with
--shared
) as input, links them together, and produces a component module. Within the component module, wasm-tools generatescore instance
opcodes to ensure the correct order of instantiation, using exports from previous instances to satisfy imports for subsequent ones. It also creates an initial module instance that contains a custom linear memory instance and associated global values (such as memory_base, __table_base, stack_pointer, etc.).Here's an example of how to create a component model from multiple wasm32-wasi modules.
Key observations include:
* The generation of wasm modules with `-fPIC --shared` options, resulting in all modules being side modules. This allows the linker or runtime to customize the linear memory layout and use exports/imports to position all module instances. * The libc.so from wasi-sdk is fundamental and invariably required.
afaik, wasm-tools component link just emulates dynamic-linking with component-model for limited cases. i'm not sure why you want to make it a reference while full implementations are available. (eg. emscripten, toywasm)
My usecase is likely going to be a single main module and a number of submodules.
I completely respect your decision. Please be aware that, in some respects,
a main module = a submodule + libc.so
. And given that modules compiled with--nostdlib
can be considered as submodules, opting for a 'lib-only' approach could significantly reduce the effort involved.
from my experience to implement toywasm libdyld, i disagree because a pie executable and shared libraries are mostly same. otoh, assuming a pie executable can save the effort a bit.
anyway, i guess it doesn't matter much because, for wamr, 90% of efforts would be taken for "fix import/export of instance resources", not dynamic linker itself.
IMU, The main module
mentioned above is a normal module. it is not a pie executable.
one is called Main module and is compiled normally like what we do now;
But I guess you are right. main module
should be a pie executable. both main
and sub
should import a memory.
May I ask how do you do to satisfy the needed libc.so
in toywasm when executing a pie executable?
Custom:
- name: "dylink.0"
- mem_size : 56
- mem_p2align : 2
- table_size : 0
- table_p2align: 0
- needed_dynlibs[2]:
- libdemo1.so
- libc.so <- ?
IMU, The
main module
mentioned above is a normal module. it is not a pie executable.one is called Main module and is compiled normally like what we do now;
i don't know from what @wenyongh got the idea. pie or not pie, the main module for dynamic-linking is not same as normal (statically linked) binary.
May I ask how do you do to satisfy the needed
libc.so
in toywasm when executing a pie executable?Custom: - name: "dylink.0" - mem_size : 56 - mem_p2align : 2 - table_size : 0 - table_p2align: 0 - needed_dynlibs[2]: - libdemo1.so - libc.so <- ?
currently it searches the file with the name (in this case "libc.so") in the user-specified host paths. (--dyld-path)
How about the import? I gladly noticed you are also using --import-memory
.
Import[10]:
- memory[0] pages: initial=1 <- env.memory
- table[0] type=funcref initial=0 <- env.__indirect_function_table
- global[0] i32 mutable=1 <- env.__stack_pointer
- global[1] i32 mutable=0 <- env.__memory_base
- global[2] i32 mutable=0 <- env.__table_base
- func[0] sig=0 <__wasm_call_dtors> <- env.__wasm_call_dtors
- func[1] sig=1 <__wasi_proc_exit> <- env.__wasi_proc_exit
...
IMU, The
main module
mentioned above is a normal module. it is not a pie executable.one is called Main module and is compiled normally like what we do now;
i don't know from what @wenyongh got the idea. pie or not pie, the main module for dynamic-linking is not same as normal (statically linked) binary.
I found that in Emscripten document: https://emscripten.org/docs/compiling/Dynamic-Linking.html#overview-of-dynamic-linking
And in the WebAssembly Dynamic Linking document, it mentions
This document describes the current WebAssembly dynamic linking ABI used by emscripten and by the llvm backend when targeting emscripten.
at the beginning, so I think they are the same.
Feature
This feature is to enable dynamic linking and dynamic loading of shared objects in WAMR. The feature enables:
dlopen
)Benefit
Implementation
This is in progress, for now opening this ticket for further discussions / requirements.
Alternatives
WAMR already has a multi-module support but this doesn't allow sharing memory between modules, which is an important limitation.