WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.4k stars 696 forks source link

Synchronous JS API callback for memory growth? #1296

Open kripken opened 5 years ago

kripken commented 5 years ago

Background: https://github.com/WebAssembly/WASI/issues/82

In brief, JavaScript needs to know when wasm memory grows, because it needs to update JS typed array views. In emscripten we have things set up in sbrk etc. to call into JS to do that when memory grows. However, in wasi the current approach is to just call the memory grow wasm instruction inside the wasm module, so JS is not updated.

JS can check if memory grew every time wasm code runs (arrive in JS, or return from wasm, etc.). Emscripten does something similar for pthreads+memory growth, but it's unpleasant, adds overhead, and is error-prone.

Another option is to add an API in wasi to update the system on memory growth.

Yet another option is to extend the JS wasm API to get a callback when memory growth occurs. That is what this issue is about.

It might look something like this:

memory.addGrowCallback(() => {
  console.log("wasm memory grew!");
  updateAllJSViewsRightNow();
});

The callback would be synchronous, that is, when wasm memory grows, all such callbacks are immediately executed, while the wasm is still on the stack etc. This is necessary because if it's an event that happens in the next JS event loop then that is too late - we may execute JS before that, and it would use the old obsolete views.

Thoughts?

Macil commented 5 years ago

Here's two existing threads on the subject: https://github.com/WebAssembly/design/issues/1271 and https://github.com/WebAssembly/design/issues/1210. They seem to address an equivalent proposal and a few others.

kripken commented 5 years ago

Thanks @Macil!

The first is the pthreads+growth issue, which is very related, but distinct. (In particular a synchonous event is not enough there. as you mention there, but it would be sufficient here.)

The second does indeed look like it arrives at this proposal, so it's too bad I didn't find it when I searched earlier... but the discussion there seems to have stalled, and the proposal isn't clearly focused on the necessary synchronous aspect. So I'd suggest keeping this issue open for discussion?

dschuff commented 5 years ago

Adding read/write methods on wasm memories (as a substitute for ArrayBuffer views) might serve this use case too, but would still mean that arraybuffer views could never be reliably used on wasm memories, which would be an unfortnate wart.

Macil commented 5 years ago

What actually happens right now (or is supposed to happen) if a Javascript thread takes a reference to a WASM memory's ArrayBuffer and is reading/writing to it while a separate WASM thread grows the memory? I assume one of these two things happen:

  1. The Javascript completes normally. For the current event loop run, the Javascript always sees the WASM memory's buffer property equal to the same value, and the length is unchanged. Only after the current event loop run does the WASM memory's buffer property point to a new value, and the existing ArrayBuffer gets invalidated.
  2. The ArrayBuffer becomes invalidated at an arbitrary moment in the Javascript execution, and the reads/writes go to an inactive copy of the memory and the writes get lost, or an exception happens.

If it's 1, then it seems like a synchronous JS API callback for memory growth would be fine. The callback wouldn't actually happen synchronously with the grow call (which happened in another thread), but it would happen synchronously with the point in time where the existing ArrayBuffer gets invalidated in the current JS context.

lars-t-hansen commented 5 years ago

IIRC, what should happen is sort of like (1). A thread obtains the memory.buffer value. This has a length field that never changes. The thread will be able to write into the buffer up to that length, even if another thread grows the memory concurrently. The extracted buffer value remains valid indefinitely, but if the thread reads memory.buffer again (even during the same turn) it may observe that the length has changed and using this buffer it can write up to the new length.

binji commented 4 years ago

Any new developments here?

syg commented 4 years ago

@kripken This all seems most cleanly solved by making sure the capabilities of ArrayBuffer, SharedArrayBuffer, and WebAssembly.Memory stay in sync. To wit, I'd be happy to resurrect the realloc portion of https://github.com/domenic/proposal-arraybuffer-transfer/ (of which I am already the champion apparently) to allow resizing of ABs and SABs along similar restrictions as put forth by wasm, while keeping their JS identity. WDYT?

kripken commented 4 years ago

@syg

Depends what you mean by "this" - do you mean the entire issue? Then IIUC .realloc() would help but not be enough. If wasm calls memory.grow internally, we need the JS buffer and views to be updated. Even if we have a .realloc() method we can use from JS, we still need for JS to be called so that we can do that (or, we'd need the buffer and views to be updated automatically).

syg commented 4 years ago

do you mean the entire issue?

Yeah, I mean the entire issue.

If wasm calls memory.grow internally, we need the JS buffer and views to be updated. Even if we have a .realloc() method we can use from JS, we still need for JS to be called so that we can do that (or, we'd need the buffer and views to be updated automatically).

That's what I'm proposing for realloc, that it be a destructive operation that keeps the identity of the existing ArrayBuffer.

kripken commented 4 years ago

Maybe I'm misunderstanding you, but I think realloc helps us do something cleaner in JS when a growth occurs, but we do still need to get notified, so we even get the chance to do anything in JS at all?

If I'm missing something, what would invoke the JS that calls realloc?

syg commented 4 years ago

Maybe I'm misunderstanding you, but I think realloc helps us do something cleaner in JS when a growth occurs, but we do still need to get notified, so we even get the chance to do anything in JS at all?

If I'm missing something, what would invoke the JS that calls realloc?

wasm's memory.grow would be explained in terms of realloc. If ArrayBuffers can be resized in place, there is no need to do the manual work of updating the views IIUC. It's technically a backwards breaking change, in that result of growing memory would no longer result in distinct ArrayBuffers in the JS side, but I feel like that won't actually break anything.

kripken commented 4 years ago

Thanks, now I see! Yes, if realloc is done automatically, so ArrayBuffers and views just automatically grow with the wasm memory, then that would solve the issue here.

Could this also happen with SharedArrayBuffers, so JS is updated across threads? That would solve a very big problem too (right now we have to manually poll if the memory grew in JS).

kmiller68 commented 4 years ago

How are we planning to handle growing a SAB across threads? Are we expecting the SAB and WebAssembly.memory to be atomically updated together? It might be very hard to make sure that every JS thread sees a consistent size for memory between WASM and JS. e.g. A WASM a = memory.grow(0) * 64 * KB (get memory size in bytes) followed immediately by a SAB b = buffer.byteLength could produce a > b, which might cause bizarre bugs...

Is the expectation that memory growth in WASM will suspend all threads and atomically update their buffers? If so, that could be expensive depending on how often people want to grow memory. It's possible this doesn't matter in practice though.

conrad-watt commented 4 years ago

We determined during previous discussions that the propagation of bounds updates between Wasm threads as a result of memory.grow would need to follow a relaxed semantics. Currently, there's no guarantee in the memory model that a grow in one thread will become visible to other threads without explicit synchronization (either through an atomic operation, or an explicit invocation of memory.size, or by performing a growth themselves).

It's still guaranteed that length updates from growth will be an "atomic counter", so other threads will never see a truly inconsistent view of memory, they just won't have freshly-grown memory reliably accessible unless they synchronize with the latest growing thread.

I'd hope JavaScript could use the same semantics, so long as realloc would truly only ever allow the memory to grow, and not shrink.

EDIT: perhaps, analogous to memory.size, reading buffer.byteLength in JavaScript could be counted as an explicit synchronization (with the last growth)? This would potentially require a barrier in some implementations, though.

syg commented 4 years ago

From my brief chats with @binji about the matter, shared memory would have different realloc restrictions than non-shared memory, and I don't really see another way than following what wasm already does.

syg commented 4 years ago

To wit, it seems very, very problematic to shrink SABs in-place. Doesn't seem so bad to shrink ABs in place?

kmiller68 commented 4 years ago

perhaps, analogous to memory.size, reading buffer.byteLength in JavaScript could be counted as an explicit synchronization (with the last growth)? This would potentially require a barrier in some implementations, though.

It seems like that might be desirable. Even if it causes some performance headaches...

syg commented 4 years ago

It seems like that might be desirable. Even if it causes some performance headaches...

Seems pretty reasonable to me to say that multiple threads witnessing the length be synchronizing.

Other restrictions included that for SABs to be resizable at all, the SharedArrayBuffer constructor be extended to include the notion of a maximum size, just like WebAssembly.Memory.

kmiller68 commented 4 years ago

Seems pretty reasonable to me to say that multiple threads witnessing the length be synchronizing.

I don't have much of an opinion on multiple threads yet but within one thread it should be probably be synchronizing. Otherwise, you have the problem where your length magically shrinks if you ask a different API because those numbers are stored at different addresses in the VM.

syg commented 4 years ago

Within one thread your semantics are constrained by agent-order anyways (or program order as it's normally known), which is stronger than synchronizing seqcst events.

kmiller68 commented 4 years ago

That's only true if all the various internal slots and whatever WASM uses are the same. Assuming slots are different then they'd be effectively relaxed, no?

syg commented 4 years ago

I think I'd have to see the particular example you're thinking of. I don't know what synchronizing accesses in a single thread means. In my mind there shouldn't be any staleness issues intra-thread for the same reason regular AB.byteLength can't ever be stale intra-thread even though it's an unordered access.

conrad-watt commented 4 years ago

Currently, if a SAB is backed by a Wasm memory which is grown, the SAB stays attached, but with the same original length (so the newly grown memory isn't accessible, due to the JS bounds check). So I think @kmiller68 is thinking of things through this lens, where the "length" of a SAB is a distinct field on the object which isn't necessarily kept in sync with the length of the underlying memory (which could have been concurrently grown in Wasm).

If I'm reading https://github.com/WebAssembly/design/issues/1296#issuecomment-644778006 correctly, @kripken is rooting for a change to this, so that growth of a SAB in one thread (either through a Wasm memory.grow or a JS realloc) would mean that the length of all SABs backed by the same buffer would update.

EDIT: my comment https://github.com/WebAssembly/design/issues/1296#issuecomment-644962043 was written presupposing that this change would be accomplished by making SABs act exactly like first-class Wasm memories, so bounds/length checks are no longer against a distinct thread/object-local "length" field, but are implemented the same way as Wasm.

kmiller68 commented 4 years ago

@conrad-watt Yeah, that's likely the most sensible way to specify it. i.e. something like, if a JS SAB or AB object is backed by wasm, then reach into however wasm memory holds the length and return that value (times WASM page size in bytes, I guess). That way, you don't have two independent places holding the length.

Alternatively, we could go the other direction too. Where WASM gets its length from JS but that's likely more more difficult as the core wasm spec has no real concept of a host hook right now, IIRC.

syg commented 4 years ago

Ah, I see. You mean "synchronizing" in the sense of the AB/SAB length vs the backing store length going out of sync.

I agree with @conrad-watt's assumption: that we're working off of the backing store. That's the only reasonable thing to do IMO, otherwise we'd need to constantly be synchronizing per-SAB wrapper's length with the single backing store, which is pretty weird.

syg commented 4 years ago

Rediscovering problems that came up the first time something like this was talked about at Mozilla with asm.js: growable ArrayBuffers unfortunately isn't enough. The various TypedArray views' lengths would also need to be auto-updated. That is both really unfortunate for implementations and for language complexity. For implementations, buffers having to track all their views is bad. For the language, not all TypedArray views should have their lengths auto-updated. For instance, if a view was created with an explicit offset and length, that shouldn't be.

Hmm...

fitzgen commented 4 years ago

FWIW, this idea has been discussed before in this issue as well: https://github.com/WebAssembly/design/issues/1210

With a synchronous event, any code that potentially allocates (and therefore potentially grows memory) can potentially run JS, which can then do basically anything, including re-enter Wasm again. Most code that calls malloc is not prepared for this. I think this is adding a new foot gun, and find it somewhat concerning.

With wasm-bindgen's generated JS glue, we check whether a view has been invalidated by memory growth, and recreate the view if necessary. Basically all view-using code goes through a single code path that handles this. I suspect these checks are overall less overhead than (or at least on par with) the VM maintaining a precise set of all active views that would need to be updated when memory grows.

Is this approach untenable for others?

syg commented 4 years ago

With a synchronous event, any code that potentially allocates (and therefore potentially grows memory) can potentially run JS, which can then do basically anything, including re-enter Wasm again. Most code that calls malloc is not prepared for this. I think this is adding a new foot gun, and find it somewhat concerning.

Yes, this is my exact concern for the synchronous callback.

conrad-watt commented 4 years ago

Ah, I had naively thought that typed arrays could also be changed to define their length in terms of the underlying buffer, but yeah, at the very least that doesn't cover the case where the length is explicitly fixed.

Looking back through the previously linked discussions, there was this suggestion here to just expose Wasm-style accesses in JS via the Wasm memory object: https://gist.github.com/kripken/949eab99b7bc34f67c12140814d2b595

Would this solve the problem?

syg commented 4 years ago

Would this solve the problem?

I think it would for the wasm use case but sets an undesirable precedent. It would encourage further divergence between wasm and JS on accessing buffers. I'm still rooting for a more integrated wasm-JS interface story.

kripken commented 4 years ago

With wasm-bindgen's generated JS glue, we check whether a view has been invalidated by memory growth, and recreate the view if necessary. Basically all view-using code goes through a single code path that handles this. I suspect these checks are overall less overhead than (or at least on par with) the VM maintaining a precise set of all active views that would need to be updated when memory grows. Is this approach untenable for others?

I think it's a possibility, yes. However, I do think it'll be noticeably slower in some cases: you basically need to poll in JS whether memory changed each time you cross the wasm/js boundary. If you call a small getter repeatedly that can be painful. It's faster to avoid polling and get notified when a growth occurs, which is a rare event.

With a synchronous event, any code that potentially allocates (and therefore potentially grows memory) can potentially run JS, which can then do basically anything, including re-enter Wasm again.

I think I don't understand the concern here. Yes, in theory such code could do horrible things. But the only thing we need that code to do is exactly recreate the views, and nothing more. That's a very simple operation - just a few lines of JS, basically a few viewX = new XArray(memory.buffer); which is safe and has no re-entry.

If a toolchain misuses this callback to do weird things, that would be a bug there - but any mechanism can be misused?

syg commented 4 years ago

If a toolchain misuses this callback to do weird things, that would be a bug there - but any mechanism can be misused?

Part of my concern is implicit re-entrancy points are in general a pandora's box, even if their intended use case is well understood and simple. This smells like a signal handler.

fitzgen commented 4 years ago

This smells like a signal handler.

Exactly.

kripken commented 4 years ago

Oh, yeah, I agree there. This is really a signal handler, and that feels a little odd to me too. I was worried about this before opening this issue, in fact, but some people thought it would be acceptable, see their reasons there.

kripken commented 4 years ago

@fitzgen

I wrote some benchmarks to check the size of the polling overhead. It can be as high as 50% in some cases that I see, but it definitely varies a lot and depends on the VM! One example is

  for (var i = 0; i < 500 * 1000 * 1000;) {
    // poll to see if the heap grew, and if so, realloc the view
    if (module.HEAP8.length > 20000000) {
      module.HEAP8 = new Int8Array(module.buffer);
    }
    // wasm function that just returns the input
    result = result | module.copy(i++);
    // ..4 more unrolled copies..
    // ..it may also be good to read from HEAP8 here for realism..
    if (i == 1000) {
      // will force one realloc in the next polling check
      module.HEAP8 = new Int8Array(20000000 + 1);
    }
  }

To see the overhead most, unroll helps to avoid the loop overhead, and at least on some VMs the last 4 lines make a big difference (they force one of the polling checks to actually be true for once, but then they return to be false again).

50% is probably an unlikely amount of overhead, but it's easy to see 10% or so. OTOH, maybe it's not common for people to call small getters across the js/wasm boundary (but I've seen plenty of examples of people doing it). In any case, manually polling in JS does add some overhead that a non-polling approach can avoid.

I think it would be good to avoid that overhead, but I definitely agree with you and others both here and in the other issue that the possible solutions have downsides too. I can try to summarize them:

  1. A toolchain's sbrk() (or whatever grows memory) can call JS to notify it to update the views. This is what Emscripten does when emitting both wasm and JS (as opposed to emitting a WASI-compatible standalone wasm with no JS).
  2. For WASI in particular, which assumes no JS, we might add a WASI API to grow memory or to get notified when memory grows, https://github.com/WebAssembly/WASI/issues/82 but there seems no interest there, and in fact they suggested the next option, 3.
  3. As proposed in this issue, the JS API could give us a callback. This kind of makes sense since the problem is on the JS side really, but yes, a sync callback feels weird, I agree with @fitzgen and @syg
  4. We can poll manually in JS whenever we cross the JS/wasm boundary, which adds some overhead in some cases, but maybe in most cases is fine.
  5. Separately from this, there is the much worse polling overhead from growth + threads, where we need to poll constantly even in loops without any calls to wasm (as another thread can grow memory at any time :scream: ). I believe none of the solutions mentioned so far can help with that, except for the option of the VM automatically handling growing of views for us, transparently. That would solve things, but it sounds like the VM side hits issues as @syg mentioned...

I don't have a good idea for a way forward here myself, given the (very reasonable) objections. We may end up with the status quo, that is, toolchains can either

(Separately, in both options, growth + pthreads requires unavoidable polling.)

guest271314 commented 4 years ago

Curious if it is possible to

  1. Create a WebAssembly.Memory({initial: 1, maximum: 8000, shared: true})
  2. Write directly (after exporting if necessary) to that single memory allocation in a shell script, e.g. bash stdout
  3. Read that written memory in JavaScript

?

guest271314 commented 4 years ago

In brief, JavaScript needs to know when wasm memory grows, because it needs to update JS typed array views.

It might look something like this:

memory.addGrowCallback(() => {
  console.log("wasm memory grew!");
  updateAllJSViewsRightNow();
});

After experimenting with dynamic memory growth within AudioWorkletGlobalScope events might not be useful at all. AudioWorkletProcessor.process() can be called between 344 and 384 times per second. Memory can grow between the calls to process() where the goal is "real-time" audio processing with limitations on other processes within the scope.

If an event occurs between process() calls should the event handler take precedence over the pending process() call?

While the event is synchronous the user could still define the handler as an asynchronous function, or call queueMicrotask() within the handler; will those potential operations impact process() calls?

Within the context of audio processing by the time the event handler is fired the time between reading that memory growth information and the next audio processing event can render the event notification moot, as the code has already proceeded with the next operation.

console.log() before and after grow() is called in the context is sufficent notification here

          this.initial = (384 * 512) / 65536; // 1 second
          this.maximum = (384 * 512 * 60 * 60) / 65536; // 1 hour
          this.byteSize = this.maximum * 65536;
          this.memory = new WebAssembly.Memory({
            initial: this.initial,
            maximum: this.maximum,
            shared: true,
          });
            write: (value, controller) => {
              if (
                this.totalBytes + value.byteLength >
                  this.memory.buffer.byteLength &&
                this.totalBytes + value.buffer.byteLength < this.byteSize
              ) {
                console.log('before grow', this.memory.buffer.byteLength);
                this.memory.grow(1);
                console.log('after grow', this.memory.buffer.byteLength);
              }

for dynamic memory growth performed within JavaScript context for audio processing, here, to avoid race conditions.

Keep track of write offset and read offsets

                for (
                  let i = 0;
                  i < value.buffer.byteLength;
                  i++, this.readOffset++
                ) {
                  uint8_sab[this.readOffset] = value[i];
                }
       const uint8 = new Uint8Array(512);
          const uint8_sab = new Uint8Array(
            this.memory.buffer,
            this.writeOffset,
            512
          ); 

          try {
            for (let i = 0; i < 512; i++, this.writeOffset++) {
              if (this.writeOffset === this.byteSize) {
                break;
              }
              uint8[i] = uint8_sab[this.writeOffset];
            }

For dynamic memory growth within WASM code a shallow copy of that memory instance exposing only the number of pages currently set as read-only properties on the object, for users to get the current page size.