Wasm needs a better memory management story

juj commented 3 years ago

Hi all,

after a video call with google last week, I was encouraged to raise a conversation here around issues we at Unity have with Wasm memory allocation.

The short summary is that currently Wasm has grave limitations that make many applications infeasible to be reliably deployed on mobile browsers. Here I stress the word reliably, since things may work on some devices for some % of users you deploy to, depending on how much memory your wasm page needs, but as your application's memory needs grow, the % of users you are able to deploy to can dramatically fall.

These issues already occur when the Wasm page uses only a fraction of total RAM of the device. (e.g. at 300MB-500MB)

These issues have been raised as browser issues, but the underlying theme is recognizing that the wasm spec is not robust enough for mobile deployment to customers.

These troubles stem from the following limitations:

No way to control in a guaranteed fashion when new memory commit vs address space reserve occurs.
No way to uncommit used memory pages.
No way to shrink the allocated Wasm Memory.
No virtual memory support (leading applications to either expect to always be able to grow, or have to implement memory defrag solutions)
If Memory is Shared, then application needs to know the Maximum memory size ahead of time, or gratuitously reserve all that it can.

So basically Wasm memory story is "you can only grab more memory, with no guarantee if the memory you got is a reserve or a commit".

These are not particularly newly recognized issues, the memory model has been the same since MVP, and we have been dealing these ever since early asm.js days, but now that applications are becoming more complex and developers' expectations on what types of applications they want to deploy on which devices is growing, and developers are actually aiming to ship to paying customers, where reliability needs to be near that 100%, we are seeing hard ceilings on this issue in the wild.

Note that listing the limitations above is not implying that fix would be for wasm spec to somehow add support to all of these, but to set the stage that these are the limitations that exist, since their contributed combination is what causes headache to developers.

The way that Wasm VM implementations seem to tackle these issues is to try to be smart/automatic under the hood about reserve vs commit behavior, and esp. around shared vs non-shared memory. However it is still the application developer's responsibility to concretely navigate the app in the low-memory landscape, and this leads to developers needing to "decipher" the VM's behavior patterns around commit vs reserve outside the spec. For an example of the vendor-specific suggestions that this leads to, see https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 .

On desktop, the Wasm spec memory issues have so far fallen in the "awkward" category at most, because i) all OSes and browsers have completed migration to 64-bit already, ii) desktops can afford large 16GB+ RAM sizes (and RAM sizes are expandable on many desktops), and iii) desktops have large disk sizes for the OS to swap pages out to, so even large numbers of committed pages may not be the end of the world (just "awkward") esp. if they go unused for most parts.

On mobile, none of that is true.

Note that wasm memory64 proposal does not relate or solve to this problem. That proposal is about letting applications to use more than 4GB of memory, but this issue is about Wasm applications not being able to safely manage much smaller amounts of memory on mobile devices. (the opposite is probably true, attempting to deploy wasm64 on mobile devices would cause even more issues)

Currently allocating more than ~300MB of memory is not reliable on Chrome on Android without resorting to Chrome-specific workarounds, nor in Safari on iOS. As per the suggestions in the Chromium thread, applications should either know up front at compile time how much memory they will need, or gratuitously reserve everything that they can. Neither of these suggestions is viable.

Why Wasm requires developers to know the needed memory size at compile time

The Wasm spec says that one can conveniently set initial memory size to what they need to launch, and then grow more when the situation demands it. Setting maximum is optional, to allow for unbounded growth. On paper this suggests that developers might not need to know how much they need at compile time.

Reality is quite different, for the following reasons:

in the wild we have reports that memory allocation success rate can be better when initially allocate K MB, versus if you first allocate less, and later try to grow to K MB. The conversation in https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 also suggests that.
if shared memory is used, one does need to know an upper bound for the maximum memory usage.
since an application will need to account for the largest memory usage it may need (or it will fail at some point of its lifetime), practically initial == maximum memory.
one cannot set a gratuitous upper bound, since that can fail the allocation,
one cannot probe the largest upper bound that works in practice, since that can suffocate the browser or other JS allocations to fail.

In practice, especially on memory constrained devices, the current spec necessitates developers to somehow "just know" how much memory will be needed.

Why expecting developers to set memory size at compile time is not feasible

With respect to memory usage patterns, there are generally three types of apps/app workloads:

1) app workloads that use an unknown amounts of memory (AutoCAD/OpenOffice/etc document editors with "bring your own workload") 2) app workloads that use varying amounts of memory ("game menu needs 100MB, game level 1 800MB, game level 2 400MB, etc.") 3) app workloads that need a known constant amount of memory,

App developers cannot know the wasm memory size of apps of first type. To enable everyone's work size, they must generally reserve everything they can, and this has problems:

if one sets a huge 4GB max memory size, the VM may not allow that allocation, failing the app from starting even for users that would have only needed 1GB,
if one probes the largest max memory size that VM will accept, such probing can cause the browser to kill the page immediately, because it thinks it is using too much memory, or if it succeeds, it can cause the app to fail later on some other JS memory allocation since the wasm allocation took up all the available memory for the web page. (or it can cause the browser itself to fail, as we saw with Chrome GPU thread on Epic UE4 Zen Garden)
even if an application does find the suitable size to avoid the above issues, after the user unloads their document, the web page is unable to release that memory back to the system. On desktop the thinking may be that it does not matter, but on mobile it is critical to be able to release unused memory back, or the OS will be more eager to kill you when task switching.

App developers of type 2) share much of the above problems that apps of type 1) have, but one might argue they should be expected to be able to find the max needed size throughout their app lifetime and allocate that, but finding that limit can be hard work, and you may not be able to do it with 100% certainty.

Or developers of apps of type 3) might certainly be expected to choose the right needed amount and be happy with it. Initially it sounds like developers who have an app of type 3 can profile their apps to come up with a suitable initial memory size and never grow. However this has issues:

sometimes you don't know if your app certainly is of type 3). Hence you might allocate an initial K MB, but choose a maximum of K+delta MB to account for unexpected growth. This can cause failures to your app when you do need to grow, since the mobile device might fail the growth. (but it might have succeeded had you chosen initial:K+delta in the first place). Same goes for apps of type 2)
because profiling memory usage can be hard, or it may be something developers don't know how to do, application developers may choose to just allocate everything they can to "remove a problem" without being aware of the consequences. We routinely see this in practice, where e.g. on itch.io you can see simple 2D games that run with a 1.5 Gig Wasm heap of which most is unused. There is uncertainty if that is wasted committed memory, or just reservation, because the spec gives no guarantees. Then they complain that web browsers/wasm is crap when their game doesn't work on mobile.

Android app switching is a major Wasm usability pain

The documentation at https://developer.android.com/topic/performance/memory-overview at the very bottom of the page states:

Note: The less memory your app consumes while in the cache, the better
its chances are not to be killed and to be able to quickly resume.

It is a common game development QA test to perform "fast app switching" testing, which can kill game UX and player interest if it does not work. For example if a user is playing a game, then gets a WhatsApp message, they will quickly switch over to WhatsApp, type in a message, and then switch back in to the game, and expect the game to still be running. Or switch over to email, or Instagram, or whatever you have, and come back a few minutes later.

The less memory your application is consuming, the better chances you have that the page will not need to reload. With native applications this prompts the developer to push their memory usage down as much as possible when they are switched out. Mobile devices do not swap memory back to disk (at least like desktops do), but they will kill background apps if they run out of memory.

For wasm apps running in a browser, this means that for an app that has extra gig in their Wasm heap going unused because they cannot release it back to the OS, the browser will become a prime target for being killed out, and when you task switch back to the app page, the page will reload from scratch, killing fast switching.

Safari even kills you on the foreground if you allocate too much - but you have no way of knowing how much that too much is.

Some applications need address space, not memory

Native compiled wasm applications behave very similar to native applications. It is often a need for a native application to reserve a lot of address space in order to get access to a chunk of linearly consecutive memory (when existing memory allocations cannot find a linear block). Wasm applications sometimes need that too. Currently the only way to do that is to .grow() by a large amount. This means that whatever smaller bits of fragmented memory a wasm app has, can go unused, but still be committed in memory. This causes wasm apps to use more committed memory than their native counterparts.

The amount of this overhead depends on the amount of fragmentation that the wasm app causes. Most native applications have not needed to care about this for ages, but for wasm, this can be all of a sudden a huge issue. Note that memory64 proposal again does not resolve this, because it does not bring virtual memory to wasm - just changes the ISA to accept 64-bit addresses (to my best knowledge)

Summarising the problems

Reiterating, the main problems that we currently see:

wasm spec expects developers to need to know the required memory size, which is not feasible for the reasons described above,
wasm apps may need to run with large overallocated memories, leading to browser failures, JS alloc failures, or if lucky, "just" to Android app switching UX problems,
wasm apps consume more memory than native counterparts, because of memory fragmentation, lack of virtual memory, and lack of unmapping memory pages

What can be done about the problem?

In a recent video call with ARM, we discussed the (lack of) adoption of Unity3D on Wasm on ARM mobile devices, and the short summary is that these memory issues are a hard wall for feasibility of Unity3D on Wasm on Android. There have been existing conversations in #1396 and #1300 about how to shrink memory, but no concrete progress.

On the concrete bugs front, if Chrome eventually migrates to 64-bit process on Android, it can help larger than 300MB Wasm applications to work on chrome. (However an issue here may be is that manufacturers are still releasing 32-bit only Android hardware in 2020, because of old inventory stock or what - we have no idea) If Safari fixes their eager page kill behavior, maybe it will help developers gauge the max limits on iPhones. But those will not help the problem that a committed memory page is still a committed memory page, and a mobile device does have to carry it around somewhere.

Besides that, here are some ideas:

Would it be possible to make the commit vs reserve behavior explicit for Wasm? Maybe as a browser coordinated extension if not for the core spec? This would give guarantees to application developers as to what the best practices initial vs maximum vs grow semantics should be. The current situation where one browser vendor recommends to probe the max amount of memory that can be reserved, vs another browser vendor expecting that apps allocate only the minimum needed amount or be killed if they exceed that, strongly suggests that the spec is missing something to connect the expectations together.
Would it be possible to add support for unmapping memory pages from Wasm? Then e.g. Emscripten could implement unmapping of memory pages into its dlmalloc() and emmalloc() implementations, fixing memory commit issues, and the related Safari "high memory consumption" process killing, and Android task switch killing troubles?
Would it be possible to somehow make a softer version of WebAssembly.Memory maximum field? If an app allocates Memory with maximum=4gb, which risks the rest of the browser/JS losing its address space (in 32-bit contexts), then maybe the browser could start reclaiming the highest parts of that reserved address space for its own purposes if the wasm app hasn't .grow()n that memory into its own use yet?

Then if one allocated a Memory with maximum probed to as much as it can go, but then allocated a large regular ArrayBuffer, maybe the browser could just steal some of that maximum back, if the Wasm app hasn't .grow()n into it? Likewise, if there was a .shrink() operation that an app could make use of, then maybe paired with this kind of address space stealing logic, the Wasm app and the rest of the browser could coordinate to "trade" address space, depending on how much of it was actually committed in the wasm heap, vs not actually used.

I hope the impressions here will not be a "this should be left to implementation details", since when I raised these concerns as a browser implementation bug, the message was that maybe the wasm spec should address this. And currently browsers are certainly not providing common enough implementations to enable developers to succeed with Wasm on mobile devices.

Thanks if you read all the way to the end on the long post!

hamza0867 commented 2 years ago

Does the new wasm specification solve this issue ?

conrad-watt commented 2 years ago

@hamza0867 the new edition of the specification only includes features that have already been standardised; unfortunately none of these address this issue.

There have been some early discussions here of potential new features (not yet standardised) which might improve the situation.

danaugrs commented 1 year ago

I think memory.shrink would make sense in a lot of cases, for both in-browser and out-of-browser applications. The instruction would mean "I don't need these last few pages of memory anymore". The runtime should be able to do as it pleases with that information. Maybe it gives it back to the OS. Maybe it doesn't. But if it wants to it can.

What's the problem with this approach?

penzn commented 1 year ago

Problem with just shrinking is that free pages might not be at the end. Though I am not sure that is a good enough reason to not introduce it: for usage patterns where that would work it would provide a relief, while the rest would stay unchanged.

titzer commented 1 year ago

Others have approached an instruction which semantically zeroes memory pages but also hints that they will not be needed soon, so that the underlying implementation can do the equivalent of madvise() calls that requests the OS release the physical pages of memory.

(edit: read up the thread a bit, I think the suggestions cover memory.shrink well).

devshgraphicsprogramming commented 1 year ago

Problem with just shrinking is that free pages might not be at the end. Though I am not sure that is a good enough reason to not introduce it: for usage patterns where that would work it would provide a relief, while the rest would stay unchanged.

It seems that this whole "feature" of a shrink method possibly not helping much, seems to be a product of two things:

WASM having a linear address space
correct me if I'm wrong but seems like WASM runtime/spec does not mention allowing for paging
most memory heavy users of WASM except maybe Mono-WASM don't do garbage collection with compaction

As I dev I can probably fix this for myself by employing some techniques for compacting my data and avoiding fragmentation, on the simplest end of the spectrum I'd prevent long-living objects, on the most complex I'd develop my own garbage collection library.

I guess that "paging" of the memory poses a security concern or it can't be done for all OSes?

juj commented 1 year ago

Hey, we are getting towards a 2 year anniversary of this conversation thread - I am wondering if there might have been updated progress or revised thoughts on the WebAssembly group on this topic?

On Unity's side, we are getting growing amounts of issue reports about running out of memory on mobile devices, and about Unity content behaving poorly with respect to trying to avoid application switching eviction behavior. More Unity Wasm developers are trying their feet with targeting mobile, and game developers overwhelmingly report that the mobile space is where gaming dominates. At the moment we are in a hard position to be able to officially call "Mobile WebGL" being a supported platform at Unity, due to the memory challenges that mobile Wasm content faces.

Most recently as of yesterday, we have started getting reports about Unity Wasm content running out of memory on mobile devices in the NASA JPL Artemis moon rocket tracking application: https://www.nasa.gov/specials/trackartemis/ that has been developed with Unity. (Those reports have been anecdotal in that we haven't been able to verify them in action, but it did did remind me to chime in on this issue)

@dtig opened the discussion thread #1439 active for the proposal https://github.com/dtig/memory-control . There the operation memory.discard was proposed. The description sounds like it would address this concern, although I struggled to find the actual parameters for the proposed call (maybe they haven't been crafted yet). Unfortunately it looks like that proposal has not progressed since 10 months ago, so inferring that it went on a pause. I wonder if there is a timeline or plan to pick it up at some point?

Again I want to echo that I would be eager to help test an implementation against Emscripten dlmalloc/emmalloc and Unity Wasm content to provide real-world feedback on how well the feature would work in practice, if/when there would be a browser+LLVM tooling implementation prototype that would become available.

dtig commented 1 year ago

@juj The proposal did go on a hiatus for some time for bandwidth reasons, and to figure out how to make memory.map/memory.unmap useful for a broader set of use cases, but I'm picking it back up now. They haven't been updated to the proposal repo, but I have a prototype in progress for memory.discard. I'll follow up offline so we can get an end to end experiment going as experimental data would be really useful in this case.

eqrion commented 1 year ago

@juj SpiderMonkey now also has a prototype of a memory.discard feature, and it's in Firefox Nightly behind the javascript.options.wasm_memory_control flag. There are more details in WebAssembly/memory-control#6.

juj commented 1 year ago

Hey, this is absolutely amazing news! Made a note to look into experimenting with this, and see how it plays out.

juj commented 1 year ago

I've now created a branch of Emscripten that adds memory.discard support to the emmalloc memory allocator: https://github.com/emscripten-core/emscripten/compare/main...juj:emscripten:memory_discard

From a super-quick test, it is working out as expected in Firefox Nightly. I'll look to do more comprehensive integrated testing as the next steps.

WebAssembly / design