AssemblyScript / assemblyscript

A TypeScript-like language for WebAssembly.
https://www.assemblyscript.org
Apache License 2.0
16.6k stars 650 forks source link

Stale Memory Reads #2814

Closed martindevans closed 3 months ago

martindevans commented 5 months ago

Question

I'm encountering an issue which might be a bug in assembly script. However I haven't managed to narrow it down to a simple reproduction program yet so I'm really just wanting to check that I'm not doing something obviously incorrect, or missing some critical thing.

The setup is a C# program which runs WASM modules written in AssemblyScript. The code is compiled with AS and then run through Binaryen to "asyncify" it. The wasm module can call "sched_yield" to trigger an async unwind and then the WASM program will be resumed one frame later. From the point of view of the running code it's one single long-lived program with time passing.

At the very start of the program it allocates a buffer like this:

let _data = heap.alloc(DATA_SIZE);
sharedmemory_set_readaddress(_data, DATA_SIZE);

On the host side this pointer is saved, and it will be written to every frame just before execution is resumed (that's how data is sent in to the WASM module). AssemblyScript code reads the values like this:

return load<i32>(_data, 52, 1); // various hardcoded offsets to certain values in the buffer

However, I'm having issues with what seems to be stale or otherwise invalid reads from that buffer in AssemblyScript. Seemingly unrelated changes to the program will cause incorrect values to be read (usually zero, when it should be something else).

Is there something fundamentally wrong with my approach here?

Other Things I've Tried

Other Possible Ideas

CountBleck commented 5 months ago

Can you provide more info about the code itself?

Also, what if you add a store<i32>(3, load<i32>(3)); to some parts of the code?

martindevans commented 5 months ago

Can you provide more info about the code itself?

Sure, I'll try to provide more of an overview (and again, sorry this is so convoluted and not a good reproduction at all).

Here's some snippets of the program, stitched together into something representative of what's going on:

// Import a method from the C# sim
@external("protologic", "sharedmemory_set_readaddress")
declare function _internal_sharedmemory_set_readaddress(addr: StaticArray<u8>, len: i32): void;

// Setup some memory on the heap
let _data: usize = heap.alloc(DATA_SIZE);

// Tell the sim about this address
_internal_sharedmemory_set_readaddress(_data, DATA_SIZE);

// There are lots of methods like this, that load from known offsets
get_some_important_number(): i32
{
    // Note alignment of one, some values in the buffer are unaligned so we tried loading every value
    // with this alignment, but it made no difference. The number we're looking at here _is_ correctly
    // aligned anyway.
    return load<i32>(_data, 52, 1);
}

// This method is called once at the start of the sim
export function tick(): void
{
    while (true)
    {
        // Depending on exactly how other bits of seemingly unrelated code are arranged, this may or
        // may not get the right value. e.g. simply printing out some other values, dependant on the value
        // read here might change the result!
        let number = get_some_important_number();

        // This causes the C# engine to do an asyncify unwind, and then rewind back in here next frame. In
        // this way the program is "always running" for the entire duration of the sim. While the WASM is
        // suspended fresh data will be written to the buffer that was registered with 
        // `_internal_sharedmemory_set_readaddress`.
        yield();
    }
}

// The C# sim implements sched_yield to trigger an asyncify unwind
@external("wasi_snapshot_preview1", "sched_yield")
declare function _internal_sched_yield(): i32;

function yield()
{
    _internal_sched_yield();

    // We've tried a few things here. e.g. calling `_internal_sharedmemory_set_readaddress` again every
    // frame just before or just after yielding. This changed the results, but it was never correct - e.g. in
    // some cases the reads after the yield would be wrong, in others reads before the yield would be
    // wrong! No combination (even doing it before _and_ after) got correct results.
}

store<i32>(3, load<i32>(3));

I can try that, do you want me to add any logging of results or anything?

CountBleck commented 5 months ago

Just see if that fixes the issue.

HerrCai0907 commented 5 months ago

Which assemblyscript version did you use?

Which target did you use to execute? According to your description I guess it is compiled wasm binary directly.

HerrCai0907 commented 5 months ago
@external("protologic", "sharedmemory_set_readaddress")
declare function _internal_sharedmemory_set_readaddress(addr: StaticArray<u8>, len: i32): void;

StaticArray<u8> in API definition is very suspicious.

I propose if you want to communicate between AS and lower runtime, the best practice is use C like memory manage solution. You can call heap.alloc / heap.free / memory.data. And pass this address as usize/u32 to lower runtime.

wasi is a good example to handle it. https://github.com/AssemblyScript/wasi-shim/blob/main/assembly/wasi_internal.ts#L11

CountBleck commented 5 months ago

@HerrCai0907 He has tried both heap.alloc and memory.data. Technically StaticArray<u8> works, since it directly points to the data (like ArrayBuffer and String).

martindevans commented 5 months ago

StaticArray in API definition is very suspicious.

Sorry that was a mistake! We tried StaticArray as another option instead of heap.alloc. It should be a usize parameter.

Which assemblyscript version did you use?

Version 0.27.22

According to your description I guess it is compiled wasm binary directly.

That's correct, it's compiled to a wasm binary and then passed through binaryen to apply asyncify. Obviously we can't turn off binaryen (since asyncify is critical to the whole thing) but we have tried running both AS and Binaryen with no optimisations (just i case it was some kind of misoptimisation, which it does feel like).

martindevans commented 5 months ago

store<i32>(3, load<i32>(3));

Adding this into the yield function (both before and after the actual sched_yield call) made no difference.

HerrCai0907 commented 5 months ago

@martindevans Who provide _internal_sched_yield? Is it binaryen async pass will transform it to normal wasm code or C# runtime provide this API function.

martindevans commented 5 months ago

The binaryen pass modifies the WASM code to support the async machinery.

Then _internal_sched_yield is written by me to invoke that machinery. Essentially it calls asyncify_start_unwind and then catches it with asyncify_stop_unwind when it returns out of the tick call. Next frame the sim will call asyncify_start_rewind and then call tick again, which will restore the stack and finally the yield call "returns" to normal execution as if nothing had happened.

CountBleck commented 5 months ago

store<i32>(3, load<i32>(3));

Adding this into the yield function (both before and after the actual sched_yield call) made no difference.

Did you try adding it inside get_some_important_number before the load?

Avril112113 commented 5 months ago

Yes, that was tested, both directly before and after the load() call.

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions!

HerrCai0907 commented 4 months ago

This problem may be due to incompatibility between AS gc and binaryen async, but it is just a guess and I has not tested it.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions!

CountBleck commented 3 months ago

@martindevans did you fix the issue?

martindevans commented 3 months ago

Unfortunately not. I never managed to pin it down to anything more specific than the very vague issue I described in the initial post.