loganek commented 1 month ago

Feature

This feature is to enable dynamic linking and dynamic loading of shared objects in WAMR. The feature enables:

loading modules compiled to WebAssembly once, but re-use symbols by dependent modules
share memory between loaded modules, so they can operate like e.g. shared objects on linux platforms
loading modules on startup as well as dynamically loading libraries while running the application (similar to dlopen)

Benefit

Ability to split large modules into sub-modules and potentially cache components that aren't frequently changed
Dynamically replace implementations of the same interfaces

Implementation

This is in progress, for now opening this ticket for further discussions / requirements.

Alternatives

WAMR already has a multi-module support but this doesn't allow sharing memory between modules, which is an important limitation.

yamt commented 1 month ago

do you mean https://github.com/bytecodealliance/wasm-micro-runtime/issues/1026?

lum1n0us commented 1 month ago

@wenyongh please feel free to jump in. I hope I've wrote down everything we've discussed.

From what we know, we are hoping this feature will cover:

a separate layer above wamr libraries to handle variant linking mechanism
implement of fully-runtime dynamic linking between core modules
re-factory current multi-module feature. Move code to a separate layer.
may even leave space for shared-nothing linking
might not consider dlxxx() linking at all
keep current loading-phase module level linking capability. Might mimic that behavior
handle linking between
- module and module.
- aot and aot.
- host and wasm(aot) (embedded)
- --native-lib and wasm(aot)
- built-in libraries(core/iwasm/libraries/) and wasm(aot) (perhaps be covered by previous)

yamt commented 1 month ago

implement of fully-runtime dynamic linking between core modules

the linked doc seems talking about components, not core modules. do you mean an equivalent for core modules?

do you mean https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md ? or something else?

keep current loading-phase module level linking capability. Might mimic that behavior

a question is what should be kept in the base library and what should be moved inot separate layer(s).

handle linking between

do you mean every possible combinations? or have you left out some combinations from the matrix? terms used here like module/aot/wasm/embedded etc are a bit unclear to me.

lum1n0us commented 1 month ago

do you mean an equivalent for core modules

It's more like what "component-ld" doing. Like creating extra core instance to provide the memory(heap).

After linking, core instance A, B, C will have their own table and global, and shared stack and heap. Something like: memory_model_of_shared_everything_linking

It might require --shared when compiling to Wasm modules.

a question is what should be kept in the base library and what should be moved into separate layer(s).

Move into separate layers. Keep base library minimum.

do you mean every possible combinations

Yes. need to handle all below five combinations. Sorry for terms. Let me try to explain:

module and aot mean .wasm files and .aot files. This part is clear, right?

"host and wasm" means to integrate wamr as libraries in a big project. Wasm imports are provided by core/iwasm/libraries and host itself. Sometimes, host will even provided their own support for wasi-libc. There are some APIs in both wasm_export.h and wasm_c_api.h can be used for that purpose.

"--native-lib" means the command line option of iwasm. It requires users provide a .so with pre-defined symbols, init_native_lib(), get_native_lib() and deinit_native_lib().

Built-in libraries are runtime provided host functions. They are stored @ core/iwasm/libraries.

yamt commented 1 month ago

are you ( @loganek and @lum1n0us ) talking about the same specific design? or different possible designs for the similar goal?

yamt commented 1 month ago

a question is what should be kept in the base library and what should be moved into separate layer(s). Move into separate layers. Keep base library minimum.

a question is how the minimum base library would look like. i guess it should contain:

import/export support as per wasm spec
plus, some extra functionality necessary to support load-time linking

lum1n0us commented 1 month ago

plus, some extra functionality necessary to support load-time linking

But also instantiation-time linking. Actually, we are looking for a way to mimic the load-time linking with instantion-time linking.

yamt commented 1 month ago

plus, some extra functionality necessary to support load-time linking

But also instantiation-time linking. Actually, we are looking for a way to mimic the load-time linking with instantion-time linking.

can you explain a bit? i was considering instantiate-time linking just a variation of ordinary import/export as per wasm spec.

lum1n0us commented 1 month ago

Yes. The main goal is to implement instantiate-time linking as wasm spec which might include modification like(not all of them, need to add later):

import memory
import table
given different imports set for variant instances

The core algorithm will be instantiate-time linking instead of load-time linking. But, for compatibility, we need to keep loading-time linking APIs. It will mean all instances of the module can share the same imports set in the new arch. The loading time check(for import modules) might still be necessary in some scenarios.

wenyongh commented 1 month ago

@wenyongh please feel free to jump in. I hope I've wrote down everything we've discussed.

From what we know, we are hoping this feature will cover:

a separate layer above wamr libraries to handle variant linking mechanism

implement of fully-runtime dynamic linking between core modules

re-factory current multi-module feature. Move code to a separate layer.

may even leave space for shared-nothing linking

might not consider dlxxx() linking at all

keep current loading-phase module level linking capability. Might mimic that behavior

handle linking between

module and module.

aot and aot.

host and wasm(aot) (embedded)

--native-lib and wasm(aot)

built-in libraries(core/iwasm/libraries/) and wasm(aot) (perhaps be covered by previous)

@lum1n0us, @yamt sorry for the late response, from what we are discussing internally and what we discussed with @no1wudi and @loganek , I think we want to implement https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md but not https://github.com/WebAssembly/component-model/blob/main/design/mvp/Linking.md#fully-runtime-dynamic-linking. Here are some of my understandings (just personal understanding, maybe incorrect):

There are two types of module: (1) one is called Main module and is compiled normally like what we do now; (2) the other is called Side module, we can use command like clang -fPIC -shared -Wl,--export=<func1>;<func2> to build out the module, or emcc -s SIDE_MODULE=1 -s EXPORTED_FUNCTIONS=“[func1;func2]”. Note that the concept is firstly introduced by Emscripten, and is supported by emsdk, clang and wasi-sdk now.
A project should contain exactly one main module and can contain one or multiple side modules
Only main module has linear memory and wasm table, and side module imports memory/table of main module through memory of import env.memory and table of import env.__indirect_function_table, and also side module imports two globals import env.__memory_base and import env.__table_base: runtime should assign a region inside main module's linear memory for the first global, then put the side module's initial data in the region, and in all the load/store opcodes that access the global data, the address should have been added with the global before accessing. And similar for the second global.

Maybe we can allocate a buffer in the libc heap of host managed heap for the side module.

The wasm/aot module can be loaded once and instantiated multiple times, so as to re-use the code. When a module is instantiated and requires to link the undefined symbols, maybe runtime can (1) firstly look up them from the instantiated module instance, and (2) if not found, look up them from the loaded modules, and (3) if not found also, lookup them from file system or callback function provided by developer.

yamt commented 1 month ago

I think we want to implement https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md

i'm reasonably familiar with the convention as i have implemented it for other runtime. https://github.com/yamt/toywasm/tree/master/libdyld

i have even considered to port it to wamr. the main obstacle was wamr's lack of spec-conforming import/export semantics. eg. https://github.com/bytecodealliance/wasm-micro-runtime/issues/1353 i guess it will be a problem for component-model as well.

3. Only main module has linear memory and wasm table, and side module imports memory/table of main module through memory of `import env.memory` and table of `import env.__indirect_function_table`, and also side module imports two globals `import env.__memory_base` and `import env.__table_base`: runtime should assign a region inside main module's linear memory for the first global, then put the side module's initial data in the region,

non-pie executable works like that. but for -pie executable, which iirc emscripten produces, linear memory etc are provided by the runtime linker. https://github.com/yamt/garbage/tree/master/c/shlib has an example for both cases.

and in all the load/store opcodes that access the global data, the address should have been added with the global before accessing. And similar for the second global.

if you mean to make the runtime do something special when executing eg. i32.load, it doesn't work that way. it's something llvm takes care of. the runtime just needs to execute simple relocations.

wenyongh commented 1 month ago

i'm reasonably familiar with the convention as i have implemented it for other runtime. https://github.com/yamt/toywasm/tree/master/libdyld

i have even considered to port it to wamr. the main obstacle was wamr's lack of spec-conforming import/export semantics. eg. #1353 i guess it will be a problem for component-model as well.

Yes, your project is cool. Do you mean it is an obstacle if we use WAMR current multi-module's import/export semantics? I am not very sure what is the best way, maybe we can implement a new linking semantics and enable core module dynamic linking based on it, and also refactor multi-module to use it.

3. Only main module has linear memory and wasm table, and side module imports memory/table of main module through memory of `import env.memory` and table of `import env.__indirect_function_table`, and also side module imports two globals `import env.__memory_base` and `import env.__table_base`: runtime should assign a region inside main module's linear memory for the first global, then put the side module's initial data in the region,
non-pie executable works like that. but for -pie executable, which iirc emscripten produces, linear memory etc are provided by the runtime linker. https://github.com/yamt/garbage/tree/master/c/shlib has an example for both cases.

Thanks for the correction. I am not sure whether there is requirement for pie executable, since normally we can compile wasm to AOT, and AOT XIP is enabled already.

and in all the load/store opcodes that access the global data, the address should have been added with the global before accessing. And similar for the second global.

if you mean to make the runtime do something special when executing eg. i32.load, it doesn't work that way. it's something llvm takes care of. the runtime just needs to execute simple relocations.

Yes, maybe runtime just needs to do few things. I meant that the address of i32.load should have been relocated, e.g., there may be another instruction adding env.__memory_base to the address before i32.load, we don't need to add env.__memory_base manually when executing the instruction. Runtime just executes i32.load as normal.

yamt commented 1 month ago

Do you mean it is an obstacle if we use WAMR current multi-module's import/export semantics?

yes. the dynamic-linking convention heavily relies on import/export of instance resources. (memory, table, globals, functions) while it might be possible to tweak it to our multi-module semantics (i dunno), i guess it's more productive to work on supporting spec-conforming import/export semantics.

I am not very sure what is the best way, maybe we can implement a new linking semantics and enable core module dynamic linking based on it, and also refactor multi-module to use it.

yes, it makes sense to design a base machinery (maybe spec-conforming import/export + (hopefully) small extra functionality) which can be used as a base of both semantics.

loganek commented 1 month ago

Are there any users relying on the multi-module support today that can not easily port to dynamic linking (i.e. recompile their code while upgrading WAMR)? Given that mult-module is very WAMR specific, and there's already a documented approach in https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md , I wonder if we could drop (or at least deprecate) multi-module support and advise users to use dynamic linking instead?

loganek commented 1 month ago

Maybe we can allocate a buffer in the libc heap of host managed heap for the side module.

Not sure what do you mean by the host managed heap? I was initially thinking we could just export malloc from the module and call it from WAMR when loading modules, but there are few issues:

we'd need to search for malloc in all the modules, as it likely won't be included in the main module (I presume in many cases libc will be side-loaded) - that's not a big deal though, but:
we can't call malloc until the module is instantiated...

Alternatively, could we perhaps rely on the __heap_base / __stack_high (or __data_end) global and update it when the modules are being loaded?

Also, I don't think there's a per-module aux stack; the aux stack IIUC should be shared among the all the modules, and the __stack_pointer global should be imported from the main module, and be used for all the modules. That poses a small challenge because __stack_pointer might not be available in the module by name; we could use a heuristic that WAMR already implements for checking aux_stack boundaries (assume it's the first non-imported global) but I don't know how reliable it is. I think it's reasonable to assume the global needs to be exported from the main module (although the doc https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md doesn't say anything about it).

Here's a diagram based on @wenyongh 's one with slight modifications described above.

edit link

wenyongh commented 1 month ago

Are there any users relying on the multi-module support today that can not easily port to dynamic linking (i.e. recompile their code while upgrading WAMR)? Given that mult-module is very WAMR specific, and there's already a documented approach in https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md , I wonder if we could drop (or at least deprecate) multi-module support and advise users to use dynamic linking instead?

I guess there are some developers using multi-module feature since some related PRs were submitted, we had better keep it:

2482

3539

3562

3563

wenyongh commented 1 month ago

Not sure what do you mean by the host managed heap? I was initially thinking we could just export malloc from the module and call it from WAMR when loading modules, but there are few issues:

The host managed heap means the heap inserted by runtime, when the wasm module is compiled with -nostdlib and libc isn't linked, the developer can insert the host managed heap into the linear memory by wasm_runtime_instantiate(.., heap_size = n): https://bytecodealliance.github.io/wamr.dev/blog/understand-the-wamr-heaps/ Note that the host native can also call module_malloc function to allocate memory from this heap, just like from libc heap when libc is linked into wasm module and malloc function is exported.

I think it may be difficult if we don't call the malloc function exposed by wasm module to assign a region for the side module: per my understanding in main module, the global data, the aux stack and the libc heap (or host-managed heap) are contiguous, you had to allocate a region from global data or aux stack - the global data should be being used and unavailable, and for aux stack, will you decrease the stack pointer each time to yield/leave a region? And if to increase global __heap_base, per my understanding, after libc heap is initialized, it should not be changed again. And it even shouldn't be changed at the beginning since in our investigation, its const value is hardcoded in wasm bytecode: normally we only changes it when libc heap isn't enabled.

we'd need to search for malloc in all the modules, as it likely won't be included in the main module (I presume in many cases libc will be side-loaded) - that's not a big deal though, but:

we can't call malloc until the module is instantiated...

IIUC, the linking happens in instantiation time, or at least assigning a region and setting the import globals happen in the instantiation time, so we can call malloc function then. We can specify that for the core module dynamic linking feature, developer must export malloc/free function when compiling wasm module.

Alternatively, could we perhaps rely on the __heap_base / __stack_high (or __data_end) global and update it when the modules are being loaded?

As mentioned above, it may be difficult to update them. And it is also not flexible, how to manage the assigned regions or should there be a region manager? Suppose that region1 and region2 are assigned, and how to handle if region1 is freed before region2?

Also, I don't think there's a per-module aux stack; the aux stack IIUC should be shared among the all the modules, and the __stack_pointer global should be imported from the main module, and be used for all the modules. That poses a small challenge because __stack_pointer might not be available in the module by name; we could use a heuristic that WAMR already implements for checking aux_stack boundaries (assume it's the first non-imported global) but I don't know how reliable it is. I think it's reasonable to assume the global needs to be exported from the main module (although the doc https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md doesn't say anything about it).

Thanks, I didn't think it carefully, maybe you are right. And there is more complex situation - both multi-thread(or lib pthread) and core module dynamic linking are enabled.

lum1n0us commented 1 month ago

It seems feasible to manage the entire linking process through multiple layers.

The foundational layer would be responsible for implementing the import and export specifications. This includes the fundamental mechanisms for importing and exporting functions, globals, memories, and tables. It also encompasses the linking that occurs during the instantiation phase. Each instance should have a distinct set of imports, which could be sourced from other instances, the host environment, or runtime libraries in core/iwasm/libraries.
The upper layer would build upon the APIs provided by the lower layer, along with additional runtime APIs found in wasm_export.h. This layer would facilitate features like multi-module support and DynamicLinking. It would also have the capability to access the contents of the dylink.o section.

Given that dynamic linking requires recompilation with specific flags (such as -fPIC --shared), we need to maintain a method for linking standard WebAssembly core modules that do not have the dylink.o section. Users of WAMR libraries might choose between APIs for the instantiation phase linking and the loading phase linking(the multi-module feature). If all WebAssembly modules are guaranteed to be recompiled, users can opt for dynamic linking.

lum1n0us commented 1 month ago

The term "fully-runtime-dynamic-linking" likely originates from the tool wasm-component-ld. Although this tool outputs a component module, it addresses a significant portion of the linking requirements. It processes multiple core modules that contain the dylink.0 section, determines the correct order for instantiation, and connects one instance's imports to another instance's exports. It can even instantiate an instance without a related module to ensure all dependencies are met.

I strongly suggest looking into the approach that wasm-component-ld uses to achieve what is known as "shared-everything-linking." By following this method, we can adhere to a "standard" process that is recognized and accepted by both the WebAssembly community and the official specification. This approach should prevent any unexpected behavior.

loganek commented 1 month ago

The host managed heap means the heap inserted by runtime, when the wasm module is compiled with -nostdlib and libc isn't linked

Ah ok, yes I'm familiar with the app heap concept, just didn't know that you're talking about the same thing - thanks for clarifying. However, I think using app heap is not a viable approach for various usecases (especially, it's not for my team).

IIUC, the linking happens in instantiation time, or at least assigning a region and setting the import globals happen in the instantiation time, so we can call malloc function then.

My concern is that we'll call a function from a module (malloc) that's not fully instantiated at that stage (because the data is not loaded), which might lead to unexpected behavior. Whereas this might work with malloc implemented in wasi libc (although I didn't check), I can think of examples where this will fail.

And if to increase global __heap_base, per my understanding, after libc heap is initialized, it should not be changed again. And it even shouldn't be changed at the beginning since in our investigation, its const value is hardcoded in wasm bytecode: normally we only changes it when libc heap isn't enabled.

Yes, I totally agree that __heap_base shouldn't be modified after libc heap is initialized. And you're right that the __heap_base is a constant value hardcoded in the WASM file itself, but I was thinking that perhaps WAMR could modify its value before instantiation as soon as it loads all the modules and knows the size of the data for each module. What do you think?

wenyongh commented 1 month ago

Ah ok, yes I'm familiar with the app heap concept, just didn't know that you're talking about the same thing - thanks for clarifying. However, I think using app heap is not a viable approach for various usecases (especially, it's not for my team).

App heap is only enabled when libc heap's malloc/free functions are not exported and developer passes heap_size larger than 0 to wasm_runtime_instantiate, it is an alternative to lib heap. I think if your cases don't need it, you can just disable it by exporting malloc/free functions of wasm module.

My concern is that we'll call a function from a module (malloc) that's not fully instantiated at that stage (because the data is not loaded), which might lead to unexpected behavior. Whereas this might work with malloc implemented in wasi libc (although I didn't check), I can think of examples where this will fail.

Do you mean at this time the side module which is to be loaded or linked hasn't been instantiated yet? But here it is to malloc the memory from the main module, maybe we can instantiate the main module first to ensure that we can call its malloc function, and lazily link its sub modules, e.g., only link the module when runtime calls a unlinked function?

Yes, I totally agree that __heap_base shouldn't be modified after libc heap is initialized. And you're right that the __heap_base is a constant value hardcoded in the WASM file itself, but I was thinking that perhaps WAMR could modify its value before instantiation as soon as it loads all the modules and knows the size of the data for each module. What do you think?

In my memory, modifying __heap_base global will make lib heap malloc throw exception, had better not do that. Another approach that I can think is to assign a lot of regions from aux stack, but this had better be an eager linking: during instantiating the main module, let runtime recursively retrieve all the side modules it (and its children modules) depends on, and then allocate regions one by one for these side modules from aux stack, and then reduce the __stack_pointer global just one time. And developer can set a larger aux stack by using -z stack-size=n for wasi-sdk when compiling wasm module. Do you think this is a better way than calling malloc function and modifying __heap_base?

loganek commented 1 month ago

Ah ok, yes I'm familiar with the app heap concept, just didn't know that you're talking about the same thing - thanks for clarifying. However, I think using app heap is not a viable approach for various usecases (especially, it's not for my team).

App heap is only enabled when libc heap's malloc/free functions are not exported and developer passes heap_size larger than 0 to wasm_runtime_instantiate, it is an alternative to lib heap. I think if your cases don't need it, you can just disable it by exporting malloc/free functions of wasm module.

Yes, my point was that relying on the app heap as a solution might not work for some of the usecases, so we probably should find another solution.

My concern is that we'll call a function from a module (malloc) that's not fully instantiated at that stage (because the data is not loaded), which might lead to unexpected behavior. Whereas this might work with malloc implemented in wasi libc (although I didn't check), I can think of examples where this will fail.

Do you mean at this time the side module which is to be loaded or linked hasn't been instantiated yet? But here it is to malloc the memory from the main module, maybe we can instantiate the main module first to ensure that we can call its malloc function, and lazily link its sub modules, e.g., only link the module when runtime calls a unlinked function?

So first of all, I don't think that malloc will always be placed in the main module. For example, if users compile with wasi libc, the malloc likely will be in the libc.so, not in their main module. However, I'm not worried about that - we could load recursively all the dependencies so the module with malloc() is also loaded at that time. My main concern that I have is that some of the malloc() implementations might rely on the fact that the data is already available (for whatever reason), so the execution will crash. Here's an extreme version of the malloc implementation that would fail:

void *malloc(size_t size)
{
  puts("allocating...");

  return 0x00000;
}

I don't expect any malloc implementation to do that, but this is just to illustrate the risk. We could make assumption that the malloc function must not access the rodata, but that's probably too restrictive.

Yes, I totally agree that __heap_base shouldn't be modified after libc heap is initialized. And you're right that the __heap_base is a constant value hardcoded in the WASM file itself, but I was thinking that perhaps WAMR could modify its value before instantiation as soon as it loads all the modules and knows the size of the data for each module. What do you think?

In my memory, modifying __heap_base global will make lib heap malloc throw exception, had better not do that. Another approach that I can think is to assign a lot of regions from aux stack, but this had better be an eager linking: during instantiating the main module, let runtime recursively retrieve all the side modules it (and its children modules) depends on, and then allocate regions one by one for these side modules from aux stack, and then reduce the __stack_pointer global just one time. And developer can set a larger aux stack by using -z stack-size=n for wasi-sdk when compiling wasm module. Do you think this is a better way than calling malloc function and modifying __heap_base?

Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:

#include <stdio.h>
extern unsigned char __heap_base;
int main() {
  printf("Hello %u\n", (unsigned)&__heap_base);
  return 0;
}

would not result with global.get __heap_base, but instead, it will be generate a i32.const XXX instruction (where XXX is a heap base value). So the idea with updating the global value won't work in case the malloc is implemented in the main module (as it likely will use the __heap_base value).

Regarding using the stack size - I think that could work, but one challenge I see for user is to figure out the right stack-size value. I think it's easy to estimate it when we compile all the modules at once; however, we have use case where we want to dynamically update some of the modules, without updating the main module. So if the data size in the submodule significantly changes, it might no longer fit into the stack size, which means the main module would have to be updated too. Having said that, I think it's more likely that the stack requirements for the submodules will change more often than the data requirements which means the main module will have to be updated anyway. So that might not be such a big of a deal.

I think both malloc and using stack is a working idea, but they both have their limitations (for stack size, it's the need for updating main module when dependent modules change, and for malloc it's the possibility of using rodata). I know that the stack size limitation is a bit of a blocker for my team; I'm not sure though how valid my concern about the malloc is - if I'm overly worried about this, please let me know, but I have a feeling that it might hit us at some point of time.

I'm going to think a bit more about this problem and share an update tomorrow, but feel free to share your thoughts too.

loganek commented 1 month ago

I've also started a branch with some initial experiments here so feel free to subscribe for updates: https://github.com/bytecodealliance/wasm-micro-runtime/compare/main...loganek:wasm-micro-runtime:loganek/dynamic-linking?expand=1 This also includes a (for now living) document where I keep track on the discussions and some of my thoughts, and I hope to get that reviewed at some point of time with the community once the most important details are fleshed out: https://github.com/loganek/wasm-micro-runtime/blob/loganek/dynamic-linking/doc/dynamic_linking_design.md

wenyongh commented 1 month ago

@loganek Thanks for the explanation and the experiment. Agree to investigate more and make a good decision. For the malloc function, if it isn't exported by wasm module and libc heap isn't linked, the host managed heap can be inserted into the linear memory, and the host native can also call wasm_runtime_module_malloc to allocate memory from it. But it really increase risks: (1) the main module should have been instantiated before runtime calls module_malloc, if main module relies on some other modules, then runtime may need to lazily load these modules, (2) calling malloc of libc heap may trigger memory.grow opcode to enlarge the memory, which may require update other modules' import memory info.

Another idea I got is to reduce __stack_pointer global to reserve a space from aux stack or increase __heap_base global to reserve a space from libc heap, and then initialize this space into another host managed heap, so runtime can allocate a region from it to the side module if needed: (1) the allocation just calls the APIs of the runtime's memory allocator, it doesn't call into wasm bytecode, (2) it is flexible than allocating regions one by one by changing __stack_pointer/__heap_base many times, (3) maybe we can add option for the developer to figure out the size of this space at the beginning.

If using the space of aux stack is an issue, then we had better try to resolve the hardcoded const issue in wasi-sdk, and be able to update __heap_base.

lum1n0us commented 1 month ago

However, I think using app heap is not a viable approach for various use-cases (especially, it's not for my team)

I'm concerned that we may have delved too deeply into specifics without first clarifying our requirements and understanding the broader context. 'Dynamic Linking' is merely one of many technologies that can be used to achieve our goal of core module linking. I'm not sure why we've become fixated on this particular method or why we believe it to be the ultimate solution for our needs. (Although I agree this is the best solution we have so far)

Upon review, I believe there are several preliminary questions that need to be addressed, and I suspect there may be more:

Regarding the toolchain: Which guest language toolchains should we take into account? This is important because we need to ensure that all potential guest languages can support specific compilation options, such as --shared, which is a Clang option. Toolchains for C-like languages, including C, C++, and even Rust, can handle this. But what about Go/TinyGo and TypeScript? These languages are used by our customers who have expressed the need for linking this time.
Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.
If a linking solution, such as 'Dynamic Linking', is limited to recompiling wasm modules, do we continue to support the linking of multiple standard wasm modules? This might be affirmative due to the multi-module feature. And potentially more issues to consider...

lum1n0us commented 1 month ago

Please allow me to contribute to the detailed discussion.

Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:

For a wasm32-wasi module, __heap_base and __heap_end are symbols provided by wasm-ld and used by dlmalloc() to define the boundaries of the heap. If I understand correctly, it's best not to interfere with these symbols and to let wasm-ld and wasi-libc handle their responsibilities.

Looking at the contents of the dylink.0 and import sections, a module compiled with -shared typically depends on libc.so and requires malloc() and realloc() functions. I'm not certain if these can be omitted when unused. It seems that the original concept for memory management in wasi-libc was to link dlmalloc() within /opt/wasi-sdk-22.0/share/wasi-sysroot/lib/wasm32-wasi/libc.so.

yamt commented 1 month ago

The term "fully-runtime-dynamic-linking" likely originates from the tool wasm-component-ld. Although this tool outputs a component module, it addresses a significant portion of the linking requirements. It processes multiple core modules that contain the dylink.0 section, determines the correct order for instantiation, and connects one instance's imports to another instance's exports. It can even instantiate an instance without a related module to ensure all dependencies are met.

I strongly suggest looking into the approach that wasm-component-ld uses to achieve what is known as "shared-everything-linking." By following this method, we can adhere to a "standard" process that is recognized and accepted by both the WebAssembly community and the official specification. This approach should prevent any unexpected behavior.

are you sure wasm-component-ld processes dylink.0 section? don't you mean wasm-tools component link?

yamt commented 1 month ago

Please allow me to contribute to the detailed discussion.

Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:

For a wasm32-wasi module, __heap_base and __heap_end are symbols provided by wasm-ld and used by dlmalloc() to define the boundaries of the heap. If I understand correctly, it's best not to interfere with these symbols and to let wasm-ld and wasi-libc handle their responsibilities.

Looking at the contents of the dylink.0 and import sections, a module compiled with -shared typically depends on libc.so and requires malloc() and realloc() functions. I'm not certain if these can be omitted when unused. It seems that the original concept for memory management in wasi-libc was to link dlmalloc() within /opt/wasi-sdk-22.0/share/wasi-sysroot/lib/wasm32-wasi/libc.so.

dynamic-linking itself doesn't require wasi or malloc.

yamt commented 1 month ago

And there is more complex situation - both multi-thread(or lib pthread) and core module dynamic linking are enabled.

basically dynamic-linking is incompatible with threads. i guess it's worth to research what emscripten does in that regard.

yamt commented 1 month ago

Maybe we can allocate a buffer in the libc heap of host managed heap for the side module.

what allocation are you talking about? in dynamic-linking, the runtime linker allocates memory regions for shared libraries by growing the linear memory.

yamt commented 1 month ago

* Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.

dynamic-linking itself doesn't requrie wasi as far as i know.

lum1n0us commented 1 month ago

The term "fully-runtime-dynamic-linking" likely originates from the tool wasm-component-ld. Although this tool outputs a component module, it addresses a significant portion of the linking requirements. It processes multiple core modules that contain the dylink.0 section, determines the correct order for instantiation, and connects one instance's imports to another instance's exports. It can even instantiate an instance without a related module to ensure all dependencies are met. I strongly suggest looking into the approach that wasm-component-ld uses to achieve what is known as "shared-everything-linking." By following this method, we can adhere to a "standard" process that is recognized and accepted by both the WebAssembly community and the official specification. This approach should prevent any unexpected behavior.

are you sure wasm-component-ld processes dylink.0 section? don't you mean wasm-tools component link?

Given that the --target=wasm32-wasip2 option will merge multiple .a files to create a component model, it's possible that wasm-component-ld might share similar capabilities with wasm-tools component link. I should use both as examples

lum1n0us commented 1 month ago

Please allow me to contribute to the detailed discussion.

Ok, I was doing a bit of experiments, and realized that the compiler actually never generate instructions for reading a the __heap_base global; instead, it puts constant values in the code; for example this:

For a wasm32-wasi module, __heap_base and __heap_end are symbols provided by wasm-ld and used by dlmalloc() to define the boundaries of the heap. If I understand correctly, it's best not to interfere with these symbols and to let wasm-ld and wasi-libc handle their responsibilities. Looking at the contents of the dylink.0 and import sections, a module compiled with -shared typically depends on libc.so and requires malloc() and realloc() functions. I'm not certain if these can be omitted when unused. It seems that the original concept for memory management in wasi-libc was to link dlmalloc() within /opt/wasi-sdk-22.0/share/wasi-sysroot/lib/wasm32-wasi/libc.so.

dynamic-linking itself doesn't require wasi or malloc.

At first glance, no. However, it might be worth considering when determining how to manage the heap area

lum1n0us commented 1 month ago

* Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.

dynamic-linking itself doesn't requrie wasi as far as i know.

If we use wasm-component-ld and wasm-tools component link as points of reference, the code from wasi-libc becomes relevant. Additionally, with the assistance of wasi-libc code, the linking requirements for wasm32-wasi modules and wasm32-unknown modules emerge as two distinct challenges. Furthermore, when using wasi-sdk toolchains, there will always be a libc.so present in the name subsection of the dylink.0 section. We must either align it with the libc.so in the wasi-sdk or create a separate version while still maintaining the potential compatibility requirements.

yamt commented 1 month ago

* Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.
dynamic-linking itself doesn't requrie wasi as far as i know.
If we use wasm-component-ld and wasm-tools component link as points of reference, the code from wasi-libc becomes relevant. Additionally, with the assistance of wasi-libc code, the linking requirements for wasm32-wasi modules and wasm32-unknown modules emerge as two distinct challenges. Furthermore, when using wasi-sdk toolchains, there will always be a libc.so present in the name subsection of the dylink.0 section. We must either align it with the libc.so in the wasi-sdk or create a separate version while still maintaining the potential compatibility requirements.

i'm not sure what's your point. if you use wasi-sdk, it requires wasi of course.

loganek commented 1 month ago

Thanks a lot for the discussion so far, let me answer some of the questions and concerns here

If using the space of aux stack is an issue, then we had better try to resolve the hardcoded const issue in wasi-sdk, and be able to update __heap_base.

I think "fixing" __heap_base might not be possible and can lead to various issues. __heap_base is an immutable global, which means that even though the linker produces the correct code, various optimizers (e.g. wasm-opt or others) might choose to optimize the global.get calls and replace them with constants. Probably the way to avoid is would be to make the __heap_base a mutable global, but that doesn't seem to be right. I think there are two solutions that seem viable to me right now:

Placing the data at the beginning of the stack:
1. Read __stack_pointer value to some runtime variable (let's call it mb
2. scan all the modules, and on instantiation update their env.__memory_base to be mb, update mb by adding the size of the data of the module
3. Update __stack_pointer global to be the mb value
Placing the data at the end of the stack. If we do that, there's no need to make any changes for the stack pointer; we might need to update some variables internally in WAMR when the stack overflow detection is enabled, but that shouldn't be a problem. edit link

I'd really like to just update __heap_base, but as mentioned above, that might not be possible at all without making it mutable. So we can probably stick to 1 or 2 (my preferred one is 2 as we don't need to manipulate the global, but happy to discuss this further).

I'm concerned that we may have delved too deeply into specifics without first clarifying our requirements and understanding the broader context. 'Dynamic Linking' is merely one of many technologies that can be used to achieve our goal of core module linking. I'm not sure why we've become fixated on this particular method or why we believe it to be the ultimate solution for our needs. (Although I agree this is the best solution we have so far)

I totally understand the concern. I think the selling points for this one are:

Dynamic Linking is already defined and documentation of this is public / open for comments, and it's placed in an easily-discoverable place (i.e. WebAssembly github org)
Dynamic Linking is already implemented by at least one toolchain

Also, the RFC itself is to implement the Dynamic Linking spec, and by doing that, solve all the problems that this spec solves. I do understand there might be other problems that Dynamic Linking doesn't solve, but that quite likely would require a toolchain support, so the discussion for that should likely happen with a broader community, not just among WAMR users/developers. The problem my team has is the lack of ability to split modules into smaller chunks, caching some of them, re-using some of the exports across different modules, and Dynamic Linking spec solves those problems for us. If there are more requirements, I'd be happy to discuss them too and see if that's something we should include in Dynamic Linking spec or can this be handled internally by WAMR.

Regarding the toolchain: Which guest language toolchains should we take into account? This is important because we need to ensure that all potential guest languages can support specific compilation options, such as --shared, which is a Clang option. Toolchains for C-like languages, including C, C++, and even Rust, can handle this. But what about Go/TinyGo and TypeScript? These languages are used by our customers who have expressed the need for linking this time.

From my perspective wasi-sdk-based toolchains (i.e. C/C++/Rust) are top priority, although I understand other teams might have different requirements, so I'd like to learn about that too. Overall, I'd love to make the feature toolchain-agnostic, but I think it'd be difficult to provide a support for all possible toolchains without making some assumptions or putting some requirements in place. For example, for memory, we could assume that the main module should at least export two globals:

data_region_start which is a global defining where the runtime is expected to place the data from sub modules
data_region_end which is a mutable global where runtime should update the value to point to the end of region (that might not be needed if we put the data at the top of the stack

Concerning the target: Should our focus be solely on wasm32-wasi modules, or do we need to consider 'wasm32-unknown' (from -nostdlib) modules as well? These represent two distinctly different challenges. If we concentrate on wasm32-wasi, wasi-libc and wasm-ld could offer additional support, and we might be able to simplify the memory allocation mechanism.

I'm not sure if that matters a lot in this case, but even if it did, we should cover as many targets as possible (in this case though, I think the solutions discussed in this thread will work for both wasm32-wasi and wasm32-unknown)

If a linking solution, such as 'Dynamic Linking', is limited to recompiling wasm modules, do we continue to support the linking of multiple standard wasm modules? This might be affirmative due to the multi-module feature.

According to PR links posted by @wenyongh it looks like there are customers using it, so we can't simply drop it. However, my proposal is to "keep it but not touch it" and eventually deprecate / remove (we'd need to discuss the timeline) - but I'd like to hear from the existing users. I think making dynamic linking compatible with the existing multi-module might just be quite a bit of effort, so I suggest we build dynamic linking as a separate library, and only re-use some code when it's straightforward. Multi-module is a non-standard extension, whereas the https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md is something that was somehow agreed by the community and even though is not a standard, some tools already are compatible with it. I don't mind keeping multi-module around, but if users can migrate to dynamic linking, deleting the multi-module support would reduce the maintenance overhead.

basically dynamic-linking is incompatible with threads. i guess it's worth to research what emscripten does in that regard.

yes agree, this requires a bit deeper investigation.

yamt commented 1 month ago

If using the space of aux stack is an issue, then we had better try to resolve the hardcoded const issue in wasi-sdk, and be able to update __heap_base.

I think "fixing" __heap_base might not be possible and can lead to various issues. __heap_base is an immutable global, which means that even though the linker produces the correct code, various optimizers (e.g. wasm-opt or others) might choose to optimize the global.get calls and replace them with constants. Probably the way to avoid is would be to make the __heap_base a mutable global, but that doesn't seem to be right.

i'm not following this discussion about the heap. it's trivial for a runtime linker to adjust __heap_base of pie executables and shared libraries. actually it's how dynamic-linking works.

i agree it's impossible for a runtime to adjust __heap_base of a statically linked module. cf. https://github.com/bytecodealliance/wasm-micro-runtime/issues/2275 but is it related to this dynamic-linking RFC?

loganek commented 1 month ago

it's trivial for a runtime linker to adjust __heap_base of pie executables and shared libraries. actually it's how dynamic-linking works.

Yes, if we assume the executable (main module) is built with -Wl,-pie flags (or -Wl,-pie -fPIC flags when the main module exports functions then indeed the idea I've had with updating __heap_base by the runtime is going to work, because the __heap_base in this case is the imported global even for main module and submodules. I was confused though because I didn't think -pie is a requirement for the WASM dynamic linking - if that's the case though, I'd be happy to move forward with the __heap_base approach.

lum1n0us commented 1 month ago

To avoid adjusting __heap_base at runtime, I'm considering using wasm-tools component link as a reference. While it's not the definitive solution, it serves as a solid example.

wasm-tools accepts multiple core modules (compiled with --shared) as input, links them together, and produces a component module. Within the component module, wasm-tools generates core instance opcodes to ensure the correct order of instantiation, using exports from previous instances to satisfy imports for subsequent ones. It also creates an initial module instance that contains a custom linear memory instance and associated global values (such as memory_base, __table_base, stack_pointer, etc.).

Here's an example of how to create a component model from multiple wasm32-wasi modules.

Key observations include:

The generation of wasm modules with -fPIC --shared options, resulting in all modules being side modules. This allows the linker or runtime to customize the linear memory layout and use exports/imports to position all module instances.
The libc.so from wasi-sdk is fundamental and invariably required.

loganek commented 1 month ago

To avoid adjusting __heap_base at runtime, I'm considering using wasm-tools component link as a reference. While it's not the definitive solution, it serves as a solid example.

From what I see this is very similar to what was discussed above, the difference is, as you pointed out, that there's no main module, but instead, the linked module is constructed out of multiple shared modules. Because none of them is the final executable, there's indeed no __heap_base. So what happens there is that the __heap_base is calculated based on the stack size and the size of all the data segment from all of the modules. So instead of updating __heap_base as we've discussed above, that linker just creates a new one (because there's none yet).

My usecase is likely going to be a single main module and a number of submodules. However, I don't think the other scenario (where there is no main module, let's call it lib-only) can't be supported in WAMR. I also think that a lot of the implementation that satisfies my use case can be re-used to implement lib-only usecase. I'll definitely keep that use case in mind during the design and make sure the implementation can easily be extended.

lum1n0us commented 1 month ago

My usecase is likely going to be a single main module and a number of submodules.

I completely respect your decision. Please be aware that, in some respects, a main module = a submodule + libc.so. And given that modules compiled with --nostdlib can be considered as submodules, opting for a 'lib-only' approach could significantly reduce the effort involved.

yamt commented 1 month ago

it's trivial for a runtime linker to adjust __heap_base of pie executables and shared libraries. actually it's how dynamic-linking works.

Yes, if we assume the executable (main module) is built with -Wl,-pie flags (or -Wl,-pie -fPIC flags when the main module exports functions then indeed the idea I've had with updating __heap_base by the runtime is going to work, because the __heap_base in this case is the imported global even for main module and submodules. I was confused though because I didn't think -pie is a requirement for the WASM dynamic linking - if that's the case though, I'd be happy to move forward with the __heap_base approach.

as pie executable is what emscripten uses, i guess it's the de-facto. even with non-pie executables, the linker can trivially allocate extra memory regions. (for app heap or something)

yamt commented 1 month ago

To avoid adjusting __heap_base at runtime, I'm considering using wasm-tools component link as a reference. While it's not the definitive solution, it serves as a solid example.

wasm-tools accepts multiple core modules (compiled with --shared) as input, links them together, and produces a component module. Within the component module, wasm-tools generates core instance opcodes to ensure the correct order of instantiation, using exports from previous instances to satisfy imports for subsequent ones. It also creates an initial module instance that contains a custom linear memory instance and associated global values (such as memory_base, __table_base, stack_pointer, etc.).

Here's an example of how to create a component model from multiple wasm32-wasi modules.

Key observations include:
* The generation of wasm modules with `-fPIC --shared` options, resulting in all modules being side modules. This allows the linker or runtime to customize the linear memory layout and use exports/imports to position all module instances.

* The libc.so from wasi-sdk is fundamental and invariably required.

afaik, wasm-tools component link just emulates dynamic-linking with component-model for limited cases. i'm not sure why you want to make it a reference while full implementations are available. (eg. emscripten, toywasm)

yamt commented 1 month ago

My usecase is likely going to be a single main module and a number of submodules.

I completely respect your decision. Please be aware that, in some respects, a main module = a submodule + libc.so. And given that modules compiled with --nostdlib can be considered as submodules, opting for a 'lib-only' approach could significantly reduce the effort involved.

from my experience to implement toywasm libdyld, i disagree because a pie executable and shared libraries are mostly same. otoh, assuming a pie executable can save the effort a bit.

anyway, i guess it doesn't matter much because, for wamr, 90% of efforts would be taken for "fix import/export of instance resources", not dynamic linker itself.

lum1n0us commented 1 month ago

IMU, The main module mentioned above is a normal module. it is not a pie executable.

one is called Main module and is compiled normally like what we do now;

lum1n0us commented 1 month ago

But I guess you are right. main module should be a pie executable. both main and sub should import a memory.

May I ask how do you do to satisfy the needed libc.so in toywasm when executing a pie executable?

Custom:
 - name: "dylink.0"
 - mem_size     : 56
 - mem_p2align  : 2
 - table_size   : 0
 - table_p2align: 0
 - needed_dynlibs[2]:
  - libdemo1.so
  - libc.so   <- ?

yamt commented 1 month ago

IMU, The main module mentioned above is a normal module. it is not a pie executable.

one is called Main module and is compiled normally like what we do now;

i don't know from what @wenyongh got the idea. pie or not pie, the main module for dynamic-linking is not same as normal (statically linked) binary.

yamt commented 1 month ago

May I ask how do you do to satisfy the needed libc.so in toywasm when executing a pie executable?
Custom:
 - name: "dylink.0"
 - mem_size     : 56
 - mem_p2align  : 2
 - table_size   : 0
 - table_p2align: 0
 - needed_dynlibs[2]:
  - libdemo1.so
  - libc.so   <- ?

currently it searches the file with the name (in this case "libc.so") in the user-specified host paths. (--dyld-path)

lum1n0us commented 1 month ago

How about the import? I gladly noticed you are also using --import-memory.

Import[10]:
 - memory[0] pages: initial=1 <- env.memory
 - table[0] type=funcref initial=0 <- env.__indirect_function_table
 - global[0] i32 mutable=1 <- env.__stack_pointer
 - global[1] i32 mutable=0 <- env.__memory_base
 - global[2] i32 mutable=0 <- env.__table_base
 - func[0] sig=0 <__wasm_call_dtors> <- env.__wasm_call_dtors
 - func[1] sig=1 <__wasi_proc_exit> <- env.__wasi_proc_exit
 ...

wenyongh commented 1 month ago

IMU, The main module mentioned above is a normal module. it is not a pie executable.

one is called Main module and is compiled normally like what we do now;

i don't know from what @wenyongh got the idea. pie or not pie, the main module for dynamic-linking is not same as normal (statically linked) binary.

I found that in Emscripten document: https://emscripten.org/docs/compiling/Dynamic-Linking.html#overview-of-dynamic-linking

And in the WebAssembly Dynamic Linking document, it mentions This document describes the current WebAssembly dynamic linking ABI used by emscripten and by the llvm backend when targeting emscripten. at the beginning, so I think they are the same.

bytecodealliance / wasm-micro-runtime

[RFC] Core module Dynamic Linking #3678

Feature

Benefit

Implementation

Alternatives

2482

3539

3562

3563