emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.73k stars 3.3k forks source link

What is the correct way to use multiple memories? #22732

Open lax1dude opened 2 days ago

lax1dude commented 2 days ago

I know that the LLVM bitcode has no concept of multiple memories, making it fairly difficult to add "true" support for accessing multiple memories through conventional variable assignment syntax, but I don't see any reason that it couldn't be done through intrinsic functions like atomics? If this feature does exist I haven't been able to find the header for it yet, and implementing intrinsic functions by myself is a bit over my head.

Currently I'm thinking the easiest way to do it without forking emscripten is to just have some imported functions for manipulating the contents of other memories, and then use wasm-merge to "link" them with a handwritten WAT file, however I have no idea how wasm-merge is supposed to be used when the ability to disable the minification of imports and exports has been hard coded to be impossible to disable by the end user (Why does this have to be a limitation? It's extremely frustrating)

My theoretical method to do it without using wasm-merge would be to just make all bodies of the functions I want to replace with handwritten WAT with some specific line of code that compiles into a "magic" sequence of WASM instructions, and then disassemble the final WASM file into WAT and search/replace the magic sequence with the handwritten WAT implementation that accesses the other memories, and then reassemble the patched WAT back to WASM. However, this is obviously a pretty dumb idea for more reasons than I probably realize, and will probably break almost immediately.

So if I were to use multiple memories "correctly" in my program, what would be the ideal way to go about it, besides just figuring out how to implement my module without using multiple memories? Thanks.

sbc100 commented 2 days ago

In terms of authoring the multi-memory code itself I think you have a few options:

  1. Write the code in a JS library can call it from native code. I guess this kind of defeats the object of MM since the wasm would still only see 1 memory.
  2. Write the code in wat and compile it to an object file (using was2wasm -r) that you can then link into your poject. Note that wat2wasm -r does get a lot testing so likely has issues. This might also require some changes to wasm-ld.
  3. Write the code in llvm assembly (.s). I don't think MM is supported today but it shouldn't be too hard to add. Same changes to wasm-ld as above would be needed too.

@dschuff, am I right in thinking the assembly doesn't support MM yet?

dschuff commented 2 days ago

Actually, LLVM bitcode sort of does have a concept of multiple memories, in the form of address spaces. It's not yet hooked up to the wasm multi-memory feature yet though. See some discussion here about the best way to do that. AFAIK LLVM assembly doesn't support MM yet, although I think it would be a logical place to start, regardless of what we decide about handling in LLVM IR.

lax1dude commented 2 days ago

I think I just have tunnel vision at this point and I can probably get my code working fine with just a single memory if I make some compromises. The reason I wanted to use multiple memories was essentially a micro-optimization, I'm trying to avoid applying asyncify to portions of my code that will not need the ability to save/restore, so I was just gonna compile two separate modules for running the threads that do need asyncify and the threads that don't (since they don't share much code to begin with), and have them communicate by watching each other's memories. It also would mean that I don't need to enable the thread-safe memory allocator for better performance while allocating and deallocating memory. If I can just accept that asyncify is an all-or-nothing sort of thing, and implement my code in just a single module with all the different threads sharing one memory, then I can just forget about trying to make the multiple memories feature work for now.

dschuff commented 1 day ago

Don't forget that another option for using Asyncify is to specify a smaller list of functions that need to be asyncified, using ASYNCIFY_ONLY, ASYNCIFY_IGNORE_INDIRECT, etc. That comes with its own challenges, but depending on how much you need, it could work. Also, at this point it's probably worth giving JSPI a try, as that's getting close to full standardization.

lax1dude commented 1 day ago

Don't forget that another option for using Asyncify is to specify a smaller list of functions that need to be asyncified, using ASYNCIFY_ONLY, ASYNCIFY_IGNORE_INDIRECT, etc.

Yeah this would be difficult because I'm making use of function pointers in a way that I seriously doubt static analysis can reliably follow to automatically resolve a dependency tree for what functions may need the ability to save and resume, so I would need to make the list if every function that is either async or could indirectly call another async function manually.

Also, at this point it's probably worth giving JSPI a try, as that's getting close to full standardization.

Had no idea it existed, thanks for letting me know.

dschuff commented 1 day ago

Yeah this would be difficult because I'm making use of function pointers in a way that I seriously doubt static analysis can reliably follow to automatically resolve a dependency tree for what functions may need the ability to save and resume, so I would need to make the list if every function that is either async or could indirectly call another async function manually.

Yep, this is exactly the problem that JSPI is designed to solve :)