bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.96k stars 624 forks source link

An experimental shared-everything multi module feature enabled at branch dev/link #1026

Open jhe33 opened 2 years ago

jhe33 commented 2 years ago

Hi, I commited an experimental shared-everything multi module linking feature recently at dev/link, which based on dynamic linking conventiion. But because the convention was updated during developing, the feature actually implemented the previous version(before July, last year). I hope it is helpful for reducing the compile footprint and time, even in the network env, it helps to reduce the download size.

in the design, there are 2 kinds of modules, root module and dependency module. root module can load only explicitly the dependency modules, and dependency module can load explicitly and implicitly its dependency modules. root modules could be a regular module which has no internal libc implementation, wasi module which has own libc, or AssemblyScript module which has own runtime. dependency module has no own memory allocator, it gets the allocator from root module or wamr builtin. that means all modules will share same linear memory.

explicit loading means explicit dlopen/dlsym/dlclose call are needed, and implicit loading relies on the dylink section in the wasm module. user can get the latter and expose the necessary functions by clang. currently only supports lazy binding, that means all function resolving and linking will occur at invoke time, not at module loading time. so far wasm calling wasm, aot calling aot are well supported. And although the implementation is platform independent, however I only tested it in linux with limited cases (zlib, cjson etc), I believe there still some situations not covered. I will give several sample code soon.

jhe33 commented 2 years ago

Table of Contents

[TOC]

how to build a shared module.

here we build a cJSON shared module using emsdk. you can use other clang toolchain.

emsdk/upstream/bin/clang -O3 --target=wasm32-unknown-emscripten -nostdlib -I/usr/include/ -Wl,--shared -fPIC -Wl,--no-entry -Wl,--allow-undefined -Wl,--export=cJSON_Parse -Wl,--export=cJSON_Delete -Wl,--export=cJSON_GetObjectItem -Wl,--export=cJSON_IsString -Wl,--export=cJSON_GetStringValue -Wl,--export=__wasm_call_ctors cJSON.c -o cJSON_shared.wasm

tips:

-Wl,--shared –fPIC : build PIC wasm module --target=wasm32-unknown-emscripten : emcc dylink supports -Wl,--export : export functions, suggest exposing necessary functions to lighten the link cost.

case 1: regular module

build root module

emsdk/upstream/bin/clang –O3 -Wl,--features=mutable-globals --target=wasm32-unknown-emscripten -nostdlib -I/usr/include/ -Wl,--no-entry -Wl,--export=main -Wl,--export=malloc -Wl,--export=free -Wl,--export=realloc -Wl,--export=__stack_pointer -Wl,--allow-undefined test_parser_shared.c -o test_parser_shared.wasm

tips:

-Wl,--features=mutable-globals and –Wl,--export=_stack_pointer : expose the __stack_pointer to shared module. -Wl,--export : exposing mem allocator functions is not necessary, which depends on where mem allocator comes from, builtin libc or root module.

code in root module

declare dlopen/dlsym/dlclose in root module, you can either include dlfcn.h or the following declaration in llvm environment.

__attribute__((import_module("env"), import_name("dlopen"))) void * dlopen(const char *, int);
__attribute__((import_module("env"), import_name("dlsym"))) void * dlsym(void*, const char *);
__attribute__((import_module("env"), import_name("dlclose"))) int dlclose(void *);
__attribute__((import_module("env"), import_name("dltest"))) int dltest(void*);

declare function pointers will be returned from dlsym.

typedef cJSON *(*func_parse)(const char *);
typedef cJSON*(*func_getobject)(cJSON *, const char*);
typedef bool(*func_isstring)(cJSON*);
typedef const char*(*func_getstring)(cJSON*);
typedef void(*func_delete)(cJSON*);

open a wasm/aot module, currently only supports to pass an absolute path or a file name in the current folder.

void * handle = dlopen("cJSON_shared.aot", 0);
if (!handle) {
    printf("open wasm failed\n");
    return -1;
}

get function pointer by dlsym, like regular C function do it.

func_parse parse = dlsym(handle, "cJSON_Parse");
if (!parse) {
    printf("dlsym cJSON_Parse failed\n");
    dlclose(handle);
    return -1;
}

call it

cJSON * cjson_handle = parse(json_txt);
if (!cjson_handle) {
    printf("parse failed\n");
    dlclose(handle);
    return -1;
}

don't forget to close handler at last.

dlclose(handle);

building VM

enable dylink, by a new switch: –DWASM_BUILD_ENABLE_DYNAMIC_LINKING

launching VM

--enable-dlopen=n :

Enable explictily dynamic module loading n is a 5-bit bitmap, each bit indicates a feature from bits[0] to bits[4], they are: bind mode, currently always lazy binding where memory allocator comes from, 0 - from builtin libc; 1 - from root module if use table space to store module exports function, 0 - no; 1 - yes if root module is a AS module, 0 - no; 1 - yes if enable cache to save symbol resolve result, 0 - no; 1 - yes, currently not supported yet e.g. n = 14 (0b1110), indicates memory from root module, lazy binding, root module is AS module and use table space

case 2: wasi module

build root module if use wasi-sdk

/opt/wasi-sdk/bin/clang -g -O3 -Wl,--features=mutable-globals --target=wasm32-wasi --sysroot=/opt/wasi-sdk/share/wasi-sysroot/ -Wl,--no-entry -Wl,--export=main,--export=malloc,--export=free,--export=realloc,--export=__stack_pointer test_parser_shared.c -o test_parser_shared_wasi.wasm

tips:

--target=wasm32-wasi : use wasi toolchain -Wl,--export= : MUST export mem allocator functions --Wl,--features=mutable-globals and –export=__stack_pointer : --Wl,--features=mutable-globals and –export=__stack_pointer

building VM

same with case 1.

launching VM

use mem allocator which comes from root module by set bit1 in –enable-dlopen. e.g. iwasm –enable-dlopen=6 test_parser_shared_wasi.aot

case 3: AssemblyScript module

build root module

AssemblyScript code

index.ts :

import {call_indirect} from "builtins"

index.ts :

@external("env", "dlopen")
export declare function dlopen(path:ArrayBuffer, flags:usize):usize;
@external("env", "dlsym")
export declare function dlsym(handle:usize, path:ArrayBuffer):usize;
@external("env", "dlclose")
export declare function dlclose(handle:usize):usize;

index.ts :

var user_stack:ArrayBuffer;
export var __user_stack_pointer:usize;

index.ts :

function init_native_env(stack_size:i32):void {
  user_stack = new ArrayBuffer(stack_size);
  __user_stack_pointer = changetype<usize>(user_stack) + (stack_size);
  console.log("stack pointer = " + __user_stack_pointer.toString());
}

index.ts :

export function main(): i32 {
  init_native_env(DEFAULT_USER_STACK_SIZE);

index.ts :

export function malloc(size:usize):usize {
  console.log("alloc " + size.toString());
  return __alloc(size);
}

export function realloc(ptr:usize, size:usize):usize {
  return __realloc(ptr, size);
}

export function free(ptr:usize):void {
  __free(ptr);
}

index.ts :

var handle:usize = dlopen(String.UTF8.encode("cJSON_shared.aot", true), 0);

var func_parse:usize = dlsym(handle, String.UTF8.encode("cJSON_Parse", true));
var func_getobject:usize = dlsym(handle, String.UTF8.encode("cJSON_GetObjectItem"));
var func_getstring:usize = dlsym(handle, String.UTF8.encode("cJSON_GetStringValue"));

index.ts :

var utf8_json = String.UTF8.encode(json_text, true);

  var json_whole:i32 = call_indirect(<i32>func_parse, changetype<usize>(utf8_json));
  var json_name:i32 = call_indirect(<i32>func_getobject, json_whole, changetype<usize>(String.UTF8.encode("name", true)));
  var json_name_value_utf8:i32 = call_indirect(<i32>func_getstring, json_name);
  var json_name_value = String.UTF8.decode(changetype<ArrayBuffer>(json_name_value_utf8));

index.ts :

  dlclose(handle);
  return 0;

building VM

same with case 1.

launching VM

use mem allocator which comes from root module by set bit1 in –enable-dlopen and tell the VM root module is an AS module by set bit3. e.g.

iwasm –enable-dlopen=14 test_parser_shared_wasi.aot

case 4: explicit link and implicit link

current implementation supports more complex loading model, e.g. root module explicitly opens a shared module, and the shared module can explicitly or implicitly open other shared modules. A module lift time management based on ref count is enabled to auto-load/unload modules before calling the function.

In this case, we will show how to do it, especially implicitly link. there are 3 modules, root module explicitly opens callee module, and callee module implicitly opens callee2 module.

build root module

emsdk/upstream/bin/clang -Wl,--features=mutable-globals -fPIC -g –O3 --target=wasm32-unknown-emscripten -nostdlib -I/usr/include/ -Wl,--no-entry -Wl,--export=main –Wl,--export=__stack_pointer -Wl,--allow-undefined test_dlopen.c -o test_dlopen.wasm

build callee and callee2 module

build callee2 first, because callee depends on callee2.

emsdk/upstream/bin/clang --target=wasm32-unknown-emscripten -nostdlib -I/usr/include/ -Wl,--shared -fPIC -Wl,--no-entry -Wl,--export=print_callee2 -Wl,--allow-undefined callee2.c -o callee2.wasm

then callee

emsdk/upstream/bin/clang –O3 --target=wasm32-unknown-emscripten -nostdlib -I/usr/include/ -Wl,--shared -fPIC -Wl,--no-entry -Wl,--export=print_callee -Wl,--allow-undefined callee.c –o callee.wasm callee2.wasm

note: callee2.wasm is appended in the above command. it's like gcc link option "-l", indicates callee implicitly depends on callee2.

tips:

An entry will be added in dylink section of callee module, which indicates callee's dependency.

implicit link

we can reference previous cases to do explicit loading, here let’s focus on how callee module implicitly open callee2 module. the following declaration could be applied in LLVM to indicate the import module, import function name and signature, like header file.

callee.c :

__attribute__((import_module("callee2.wasm"), import_name("print_callee2"))) void print_callee2(int);

under the hood, callee.wasm will include an entry in import section. Based on these info (dylink and import sections), VM could implement a link (module loading and function resolving) when the call happens (lazy binding).

end

jhe33 commented 2 years ago

An introduce to implementation details shared-everything_multi-module_at_WAMR.pdf

andrewrch commented 2 years ago

Hi @jhe33 @wenyongh

I'm very interested in this feature. Is it likely this or something similar would find it's way to the main branch? What would be the best way to take this forward, or are there other options being considered for dynamic loading?

My use case is similar to that described - I'd like to download a very small wasm module which can then download and dynamically load more modules depending on when they are required. (E.g. similar to import() in JS).

jhe33 commented 2 years ago

@andrewrch thank you for your interest. Have you tried this branch? from your description, I think it's helpful for your cases.

this patch introduced some changes, like string interning(main has no string interning imeplentation right now), optimized native call (as I know, main has already optimised the native call) etc, not acceptable by main branch. therefore, it's not the proper time merging back to main stream. of cource, rebasing it to main is an alternative choice.

if your case is not sensitive to performance, you can leverage JS glue code or Host your VM embedded in to workround the calling to other wasm modules.