bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.93k stars 624 forks source link

[RFC] Basic support for Memory64 proposal #3091

Open TianlongLiang opened 9 months ago

TianlongLiang commented 9 months ago

Summary

Plan to implement basic support for the memory64 proposal, first focusing on the classic interpreter and AOT running modes.

Overview of Basic Support Plan

With basic support completed, users should experience the following:

Impact of memory64 proposal on WAMR's data structure

This section lists the changes in the memory64 proposal that could potentially influence WARM's internal data structure and code logic. Let's see what those influences could be

Topics for Further Discussion

😄 Suggestions or input on the following topics (or others you can think of) are appreciated.

1. Should Memory64 Only Be Supported on 64-Bit Machines?

I believe it's not very useful to support the memory64 proposal on 32-bit machines, as more than 4GB of memory can't be utilized anyway. If linear memory is below 4GB, sticking with default memory32 seems sensible. Supporting memory64 on 32-bit machines would also add implementation complexity. Therefore, the initial plan is not to support the memory64 proposal on 32-bit machines.

2. Exported Runtime APIs for Host Use When Embedding WAMR VMCore

As previously mentioned, the initial plan is to add new APIs like wasm64_xxx for memory64 wasm/aot files, while keeping the old APIs unchanged for memory32 wasm/aot files.

// New APIs
WASM_RUNTIME_API_EXTERN uint64
wasm64_runtime_module_malloc(WASMModuleInstanceCommon *module_inst, uint64 size, void **p_native_addr);
// Old counterpart APIs
WASM_RUNTIME_API_EXTERN uint32
wasm_runtime_module_malloc(WASMModuleInstanceCommon *module_inst, uint32 size, void **p_native_addr);

Alternatively, we could modify existing APIs without adding new ones.

// Choice 1: use uintptr_t
WASM_RUNTIME_API_EXTERN uintptr_t
wasm_runtime_module_malloc(WASMModuleInstanceCommon *module_inst, uintptr_t size, void **p_native_addr);
// Choice 2: use uint64
WASM_RUNTIME_API_EXTERN uint64
wasm_runtime_module_malloc(WASMModuleInstanceCommon *module_inst, uint32 size, void **p_native_addr);

These two choices are worth considering. The first fits well on 32-bit platforms, and the second maintains consistent function signatures. At first glance, they seem preferable to adding new wasm64_xxx APIs. However, they could cause problems for existing programs that want to be compiled with the updated version of WAMR, which is why adding new APIs is the primary choice. Additionally, new APIs can reduce programming errors and enhance readability by clearly indicating the memory standard used for a given WASM file/module, with explicit return value types.

Plan to go with choice 2: use uint64 to modify existing APIs, keeping one set of APIs is easier for future maintenance.

3. Internal API implementation choice

Adding new export APIs or changing the export APIs requires corresponding changes in internal implementation APIs too. For example:

/* Internal API */
uint32
wasm_runtime_module_realloc_internal(WASMModuleInstanceCommon *module_inst,
                                    WASMExecEnv *exec_env, uint32 ptr,
                                    uint32 size, void **p_native_addr);

The ptr, size, and returned ptr should be uint64 when it's Memory64.

So we can change those types to:

  1. typedef mem_offset_t = uint32/uint64; based on CMake flag
  2. uintptr_t;
  3. uint64;

The initial choice is 1 for it won't have any change if the CMake flag is disabled. Only when it is enabled for runtime, then we change the internal implementation, change the code logic, do type conversion, etc.

Plan to go with choice 1

#ifdef WASM_ENABLE_MEMORY64 != 0
typedef linear_mem_ptr_t uint64;
#else
typedef linear_mem_ptr_t uint32;
#endif

4. Maximum size for memory64 linear memory

Theoretically, the linear memory for memory64 can be much larger even using our existing u32 data representation for page count, it is in the scale of hundreds of TeraBytes.

Should we have a practical maximum size for linear memory to control the actual usage of memory? Like 64GB or something

loganek commented 9 months ago

running memory32 wasm/aot files is not possible (though you can still run memory32 wasm/aot files with the default runtime compile without adding the cmake flags)

I wonder if we could consider adding support for running both wasm32 and wasm64 on wamr64 runtime from the start? Our team is very interested in that scenario as that'd enable us to do a/b testing without having sequential native updates. I'd be happy to help with the work around that.

WASMMemory stay unchanged

I agree that max page count bigger than UINT32 doesn't make sense; but just out of curiosity, would that break the ABI for AOT? Or what's the reason for not making the change to conform the spec? If it's due to ABI change, I think it's ok since we're breaking the ABI with this feature anyway.

Just FYI, I'm going to pick up the work on the wasi-libc to support it in the toolchain as well; the work has already been started in https://github.com/WebAssembly/wasi-libc/pull/444 but looks like it's stuck atm so I'm going to make some progress on that. That'd probably help us testing memory64 support in wamr with WASI a bit easier.

TianlongLiang commented 9 months ago

I wonder if we could consider adding support for running both wasm32 and wasm64 on wamr64 runtime from the start? Our team is very interested in that scenario as that'd enable us to do a/b testing without having sequential native updates. I'd be happy to help with the work around that.

Yes, I think we can definitely support both from the start, it wouldn't change too much from my original design, mostly some instantiating logic changes I guess. I was merely considering my bandwidth so aiming for simplicity for testing. With your team's help, I think we can definitely support both from the start.

I agree that max page count bigger than UINT32 doesn't make sense; but just out of curiosity, would that break the ABI for AOT? Or what's the reason for not making the change to conform the spec? If it's due to ABI change, I think it's ok since we're breaking the ABI with this feature anyway.

You are right that we are breaking AOT ABI anyway, but breaking ABI is not my main concern. I was thinking since we won't be using more pages, changing the type would change more code logic for no actual benefits but also take up more space, making the whole changes more complex.

I just realized that my statement can cause some confusion here. What I truly mean is that:

Of course, I could miss something, feel free to point it out and we can discuss other alternatives together 😄

loganek commented 9 months ago

So we can change those types to:

typedef mem_offset_t = uint32/uint64; based on CMake flag uintptr_t; uint64;

I'd strongly discourage us from using uintptr_t - this is used for representing the host address, and using that for linear memory address is not good for the following reasons:

I think the first option:

#ifdef WASM64
typedef mem_offset_t uint64;
#else
typedef mem_offset_t uint32;
#endif

looks best. I'm not 100% sure though whether offset is the best name to represent the address; I'd assume the offset can be negative, but if we want to use it to represent an absolute address, then perhaps we could use mem_address_t or linear_mem_ptr_t etc.

TianlongLiang commented 9 months ago

looks best. I'm not 100% sure though whether offset is the best name to represent the address; I'd assume the offset can be negative, but if we want to use it to represent an absolute address, then perhaps we could use mem_address_t or linear_mem_ptr_t etc.

I think linear_mem_ptr_t is a solid choice

wenyongh commented 9 months ago

Hi, for the linear memory related APIs exposed (and its internal APIs), I think there may be several questions to discuss: what is type of related linear memory offset arguments/results in these APIs? Whether to define mem_offset_t or linear_mem_ptr_t type? And whether to use cmake flag to control it?

Note that developer may directly include wasm_export.h and use wamr static/shared lib in his project, in which the cmake flag in WAMR won't take effect in wasm_export.h, so we should not use internal macro to control API definitions in wasm_export.h.

So my opinion is that there may be two options:

I think the first one simplifies the implementation and reduces the maintain effort, but it may slightly impact the footprint and performance. Since there are too many features now and the source code is already very complex, it is my personal preference.

BTW, we had better refactor the current code to change API definitions and data structure definitions (memory instance, module instance, etc.) and resolve historical issues (e.g. UINT32_MAX linear memory size limitation), to make the AOT ABI well defined before we start to implement memory64, since we may release 2.0.0 after GC is merged and before memory64 is implemented.

wenyongh commented 7 months ago

Refactor APIs and data structures as preliminary work for Memory64 was merged into main.

wenyongh commented 7 months ago

https://github.com/bytecodealliance/wasm-micro-runtime/pull/3240 https://github.com/bytecodealliance/wasm-micro-runtime/pull/3260 https://github.com/bytecodealliance/wasm-micro-runtime/pull/3266

yamt commented 5 months ago

do you have any plans on table64? https://github.com/WebAssembly/memory64/issues/51

TianlongLiang commented 5 months ago

Currently, there is no plan regarding the table64 proposal implementation, a corresponding new RFC issue probably would be open if anyone were to implement it

trcrsired commented 4 months ago

They clearly f things up on 64 bit support. I want to duplicate all wasi apis to have a _wasm64 suffix but they seem to even ignore what i am saying. I would like to hear what you guys think.

gajanak commented 4 months ago

A small hint: If MEMORY64 is enabled , and a wasm64 is loaded, Strings are handled by I64 pointer isnteas i32 ?

Then the Signature Check should be patched, to support I64 for $ and for native functions. (wasm_native.c : 131 ) if (type->types[i] != VALUE_TYPE_I32) / pointer and string must be i32 type */ return false;

if WASM_ENABLE_MEMORY64 -> this should be VALUE_TYPE_I64

Perhaps this helps someone - wonder about signature mismatch error - experimenting with 64 Bit ;)

loganek commented 4 months ago

Thanks @gajanak , this is a known issue and we were planning to address it as part of the WASI work (overall, memory64 support is in still quite an experimental stage)

TianlongLiang commented 4 months ago

And one more TODO in the future, not WIP currently, but listed here for reference: Implement table64 extension

fridaymore commented 2 months ago

Great to see this being worked on -- any insight into when this feature will be ready for use in a development setting and in a production setting?

TianlongLiang commented 2 months ago

I start to work on table64 and hopefully, it can be ready in a month or so