WebAssembly / wasi-sdk

WASI-enabled WebAssembly C/C++ toolchain
Apache License 2.0
1.28k stars 192 forks source link

Code unexpectedly goes out of bounds with any level of optimisation #419

Open infinitesnow opened 6 months ago

infinitesnow commented 6 months ago

I compile the following code:

#include <vector>
#include <stdint.h>

__attribute__((import_module("env"), import_name("jstest")))  void jstest(int32_t);

extern "C" {
    __attribute__((export_name("edit"))) void edit(char* arr, const uint8_t SIZE) {
        for (uint32_t i = 0; i < SIZE; i++) {
            arr[i] = 5;
        }

        std::vector<unsigned> myvec = {0xDEADBEEF,SIZE,222};
        myvec.resize(SIZE); // Any value >3, its current size -> Array out of bound access in any browser I tried

        jstest(myvec[0]);
        jstest(myvec[1]);
    }

}

From the browser:


let jstest = function(a:Number) { 
  console.log(`Got: ${a}`)
}

async function getWasm() {
  return WebAssembly.instantiateStreaming(fetch("wasm/main.wasm"), { env: { jstest: jstest}  } ).then(
    ({module, instance}) =>  instance.exports
  );
}

window.addEventListener("load", () => {
    getWasm().then( exports => {
        const MAX_SIZE = 10
        let data =  new Uint8Array((exports.memory as WebAssembly.Memory).buffer,0,MAX_SIZE)

        let edit = exports.edit as CallableFunction;

        console.log("before",data)

        for ( let i=0; i<MAX_SIZE; i++ ) {
            data[i] = i
        }
        edit(data,5);

        console.log("after",data)
      })

Stack reported by Chrome:

main.wasm:0x84d Uncaught (in promise) RuntimeError: memory access out of bounds
    at main.wasm.dlmalloc (http://localhost:5500/www/wasm/main.wasm:wasm-function[23]:0x84d)
    at main.wasm.malloc (http://localhost:5500/www/wasm/main.wasm:wasm-function[22]:0x5d6)
    at main.wasm.operator new(unsigned long) (http://localhost:5500/www/wasm/main.wasm:wasm-function[20]:0x58a)
    at main.wasm.std::__2::allocator<unsigned int>::allocate[abi:nn180100](unsigned long) (http://localhost:5500/www/wasm/main.wasm:wasm-function[8]:0x340)
    at main.wasm.std::__2::vector<unsigned int, std::__2::allocator<unsigned int>>::__vallocate[abi:nn180100](unsigned long) (http://localhost:5500/www/wasm/main.wasm:wasm-function[5]:0x244)
    at main.wasm.std::__2::vector<unsigned int, std::__2::allocator<unsigned int>>::vector[abi:nn180100](std::initializer_list<unsigned int>) (http://localhost:5500/www/wasm/main.wasm:wasm-function[2]:0x180)
    at main.wasm.edit (http://localhost:5500/www/wasm/main.wasm:wasm-function[1]:0x103)
    at main.wasm.edit.command_export (http://localhost:5500/www/wasm/main.wasm:wasm-function[32]:0x2b0c)
    at http://localhost:5500/www/js/main.js:110:13

I build with:

/opt/wasi-sdk/bin/clang++ --sysroot /opt/wasi-sdk/share/wasi-sysroot -I/opt/wasi-sdk/share/wasi-sysroot/include/wasm32-wasi/c++/v1 -I/opt/wasi-sdk/share/wasi-sysroot/include/wasm32-wasi -L/opt/wasi-sdk/share/wasi-sysroot/lib/wasm32-wasi -o www/wasm/main.wasm main.cpp -fno-exceptions -Wl,--no-entry -Oz -nostartfiles -Wl,--max-memory=13107200

Changing -O* to -O0 fixes the issue.

I built the SDK myself (on a WSL and a macOS host) with a small LLVM patch to allow baremetal code to run in the browser without runtime / imports:

diff --git a/libcxx/include/__availability b/libcxx/include/__availability
index b8b2da9bb122..a1fbf8cfe3f7 100644
--- a/libcxx/include/__availability
+++ b/libcxx/include/__availability
@@ -128,7 +128,7 @@
 // This controls whether the library claims to provide a default verbose
 // termination function, and consequently whether the headers will try
 // to use it when the mechanism isn't overriden at compile-time.
-#  define _LIBCPP_AVAILABILITY_HAS_VERBOSE_ABORT 1
+#  define _LIBCPP_AVAILABILITY_HAS_VERBOSE_ABORT 0
 #  define _LIBCPP_AVAILABILITY_VERBOSE_ABORT

 // This controls the availability of the C++17 std::pmr library,
diff --git a/libcxxabi/src/abort_message.cpp b/libcxxabi/src/abort_message.cpp
index 859a5031b93f..49a9cbb3e71e 100644
--- a/libcxxabi/src/abort_message.cpp
+++ b/libcxxabi/src/abort_message.cpp
@@ -31,7 +31,7 @@ void abort_message(const char* format, ...)
     // Write message to stderr. We do this before formatting into a
     // variable-size buffer so that we still get some information if
     // formatting into the variable-sized buffer fails.
-#if !defined(NDEBUG) || !defined(LIBCXXABI_BAREMETAL)
+#if 0
     {
         fprintf(stderr, "libc++abi: ");
         va_list list;

Attaching reproducers (good -O0/bad -Oz) reproducers.tar.gz

Might be an LLVM issue, or I might be doing something wrong. Thought I'd start from reporting here.

sbc100 commented 6 months ago

Your JS code looks like it is setting the first 10 bytes of memory, which is almost certainly not what you want to be doing.

You need to find a region of memory that is available to write to. The easiest way to do that is to call the malloc export and write into the resulting buffer.

Furthermost you cannot pass data to to the edit functions since data is a JS object (a typed array) and edit takes a pointer (i.e. a Number).

Do you would want to do something like this:

var ptr = exports.malloc(MAX_SIZE);
for ( let i=0; i<MAX_SIZE; i++ ) {
  data[ptr + i] = ...
}
exports.edit(ptr,5);
exports.free(ptr);
infinitesnow commented 6 months ago

Right, I see. Yeah that was unreasonable.

However I’m still confused, I can’t seem to be able to understand what —import-memory is supposed to do. If that exists, why do I have to malloc from the browser? I have looked into the linker code, it defines some imports in the Imports section but doesn’t seem to do any symbol resolution. Why can’t I link a char pointer to the imported memory, have the loader resolve it during module instantiation and use it in C++?

On Mon, 13 May 2024 at 18:11, Sam Clegg @.***> wrote:

Your JS code looks like it is setting the first 10 bytes of memory, which is almost certainly not what you want to be doing.

You need to find a region of memory that is available to write to. The easiest way to do that is to call the malloc export and write into the resulting buffer.

Furthermost you cannot pass data to to the edit functions since data is a JS object (a typed array) and edit takes a pointer (i.e. a Number).

Do you would want to do something like this:

var ptr = exports.malloc(MAX_SIZE); for ( let i=0; i<MAX_SIZE; i++ ) { data[ptr + i] = ... } edit(ptr,5); exports.free(ptr);

— Reply to this email directly, view it on GitHub https://github.com/WebAssembly/wasi-sdk/issues/419#issuecomment-2108119029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDKTYICQVVD4UC35K5UDQLZCDQ3BAVCNFSM6AAAAABHUJ6Z2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBYGEYTSMBSHE . You are receiving this because you authored the thread.Message ID: @.***>

sbc100 commented 6 months ago

Yes, you should call malloc to get your memory region. It doesn't matter if the memory is imported or exported, both sides (Wasm and JS) still need to agree on the memory layout.

I'm not quite sure what you mean by the second half of your question, but all static data needs to be resolved by the linker, so it can do that memory layout. You can't have static data that the linker does know about (at least not without using dyanmic linking which is probably not what you want).

It is possible to export a static char buffer is that works for you. e.g:

char myBuf[1024];

Then use -Wl,--export=myBuf to export it and access that address from exports.myBuf. That would avoid calling malloc.

infinitesnow commented 6 months ago

You can't have static data that the linker does know about (at least not without using dyanmic linking which is probably not what you want).

If that's the case, then I don't understand how memory can be imported. My understanding is that an imported function is dynamically loaded. I was under the impression that -import-memory would give me a relocatable symbol that I could access in code?

That sounded reasonable, I'd just create a few object of variable size in JS, pass them at instantiation time, then from C++ I could just treat those as normal .data memory. Is that not what imported memory is?

sbc100 commented 6 months ago

You can't have static data that the linker does know about (at least not without using dyanmic linking which is probably not what you want).

If that's the case, then I don't understand how memory can be imported. My understanding is that an imported function is dynamically loaded. I was under the impression that -import-memory would give me a relocatable symbol that I could access in code?

That sounded reasonable, I'd just create a few object of variable size in JS, pass them at instantiation time, then from C++ I could just treat those as normal .data memory. Is that not what imported memory is?

In the C/C++ there is just one memory. You can import it or export, but its layout will be determined statically at link time by wasm-ld. The layout of the memory is something like |zero page|static data|stack|heap...|. With static linking you cannot alter this layout after linking except the grow the heap on the right hand side as the memory itself grows.

infinitesnow commented 6 months ago

In C/C++ there is just one memory.

Yes, this would be my assumption. But if that's the case, I don't understand, what's the difference between exporting and importing memory in wasm? Wouldn't that effectively accomplish the same thing since it's one and the same block, regardless of which direction it is exchanged?

Maybe my use case is too simple, is there a use case where clearly one approach is better?

sbc100 commented 6 months ago

In C/C++ there is just one memory.

Yes, this would be my assumption. But if that's the case, I don't understand, what's the difference between exporting and importing memory in wasm? Wouldn't that effectively be the same thing?

Not a huge amount is different.

In one scenario, the memory (and its size limits) are statically baked into the wasm file. In the other scenario you must call new WebAssembly.Memory from JS. This gives you a little more flexibility at the cost of an extra couple of lines of JS code. I recommend sticking with the exported memory to keep the JS code simpler/smaller.

infinitesnow commented 6 months ago

Cool, thanks for the help.

I reckon importing would also allow you to save / restore the state of a module. I'll try the malloc implementation.