emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.65k stars 3.29k forks source link

Possible bug in WASM: Pointer math issue when using Emscripten as backend for FreeBasic #20325

Open angros47 opened 1 year ago

angros47 commented 1 year ago

I have found some weird issues that lead to code producing the exception "index out of bounds". The minimal code that produces the error is:

int arr[14];
int main(){
    for (int i=0;i<=5;i++){
        *(int*)(((int)(int*)arr + (-i << (2 & 31))) + 52) = i;
    }
}

This code, that is very similar, seems to work with no problem:

int main(){
    int arr[14];
    for (int i=0;i<=5;i++){
        *(int*)(((int)(int*)arr + (-i << (2 & 31))) + 52) = i;
    }
}

What might cause it? In case you wonder why I am using such a weird code, instead of a simple "arr[13-i]=i;", the reason is that the code is produced by the C emitter of the FreeBasic compiler (and it works perfectly with the GCC compiler on other platforms). More information is available on the FreeBasic forum, in this thread

angros47 commented 1 year ago

My emcc version is:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.46 (19607820c447a13fd8d0b7680c56148427d6e1b8) clang version 18.0.0 (https://github.com/llvm/llvm-project 75501f53624de92aafce2f1da698b249a7293dc7) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: /home/angelo/Documenti/FreeBASIC/emsdk/upstream/bin

kripken commented 1 year ago

Looks like undefined behavior:

$ ./emcc a.cpp -fsanitize=undefined
$ nodejs c.out.js
a.cpp:4:33: runtime error: left shift of negative value -1
angros47 commented 1 year ago

I tried to modify the test code to this:

int arr[14];
int main(){
    for (int i=0;i<=5;i++){
        *(int*)(((int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;
    }
}

Same issue, "index out of bounds". The code:

int main(){
    int arr[14];
    for (int i=0;i<=5;i++){
        *(int*)(((int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;
    }
}

works normally. By using "-(i << (2 & 31))", I made sure the value that is left shifted is positive, and the sign is changed only after, but this does not prevent the issue from happening. So, I tend to exclude this being the cause.

angros47 commented 12 months ago

Even weirder: if compiled with the parameter "-fsanitize=undefined", it returns no error and works as expected. Without that parameter, it crashes

angros47 commented 12 months ago

Further information: if the parameter "-sWASM=0" is added, all seems to work as expected.

Even the first code (with the syntax that is supposed to produce undefined behavior) works perfectly. The problem happens only when WASM code is produced, not when JavaScript code is produced

sbc100 commented 12 months ago

All the code examples above also crash when compiled with gcc or clang:

$ gcc -O2 test.c && ./a.out 
Segmentation fault
$ clang -O2 test.c && ./a.out 
Segmentation fault

What makes you think this is an emscripten specific issue?

angros47 commented 12 months ago

Have you compiled it with no change to native code for your machine? You should either try it on a 32 bit machine, or modify the line:

*(int*)(((int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;

To: *(int*)(((long long int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;

The pointer is temporarily cast to an integer, and Emscripten uses 32 bit pointers, while your machine likely uses 64 bit pointers

sbc100 commented 12 months ago

I see, yes with -m32 I don't see a segfault.

I also can't repro with emscripten:

$ emcc -O2 test.c && node ./a.out.js
angros47 commented 12 months ago

I confirm: with -O2 it doesn't happen. Even weirder

angros47 commented 12 months ago

Also, I read that the left shift of a negative value is defined in C++20 specifications, actually

juj commented 12 months ago
*(int*)(((int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;

does not seem correct? i ranges between 0 and 5 inclusive, so i << 2 ranges between 0 and 20 inclusive, so arr pointer gets subtracted by 20 in the last iteration of the loop. In C/C++, if you have int *arr, adding -20 to the pointer moves the pointer back by 20 elements, and not by 20 bytes.

So (int*)arr + -20 + 52 is same as ((int*)arr)[32], which will go out of bounds of int arr[14];, since it doesn't have enough size.

If you run with -fsanitize=undefined, you can see an error message

RuntimeError: memory access out of bounds

which is like expected, since the memory access does go out of bounds of the array.

with -O2 it doesn't happen. Even weirder

The reason for the issue hiding with optimizations is that it all gets optimized out since arr is never read from.

juj commented 12 months ago

Oh, looks like the + 52 part is outside the parentheses, casted to int, so that needs divided by 4 to convert to elements. 52/4 == 13, so looks like the elements it accesses are

i=0: arr[13]
i=1: arr[9]
i=2: arr[5]
i=3: arr[1]
i=4: arr[-3] -> out of bounds
i=5: arr[-7] -> out of bounds
sbc100 commented 12 months ago

I was able to get memory access out of bounds without the loop and just with i=1:

int arr[14];

int main() {
  int i = 1;
  *(int*)(((int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;
}
angros47 commented 12 months ago

Then, why does it work if I disable the WASM and use the javascript emitter? It should either fail in both cases, or work in both cases?

And why does it work if the array is declared inside "main()"?

Also, there is a double casting: (int)(int*)arr, that means "arr" is first cast to an integer pointer, and immediately after is cast to integer. So, incremending or decrementing a simple integer should just change its value, and when it's cast again to a pointer, the pointer should refer to the new value.

I mean, let's say we declare a pointer p

int arr[100]; //an array. Let's suppose it's allocated at address 1000 int p=arr; //Value of p should be 1000 p++; //increments p by one element, so its value should be 1004 p=(int) ((int)p+1); //should be 1005

angros47 commented 12 months ago

Also: why this does cause an exception:

int arr[14];

int main() {
  int i = 1;
  *(int*)(((int)(int*)arr + (-(i << (2 & 31)))) + 52) = i;
}

And this doesn't?


int main() {
  int i = 1;
  *(int*)(((int)(int*)arr + (-(1 << (2 & 31)))) + 52) = i;
}

Isn't the code perfectly equivalent? I just replaced "i" with "1" inside the formula

sbc100 commented 12 months ago

Clearly the compiler/optimizer in llvm is doing something different in those two cases. My guess is that this is an issue the relates to offset parameter of the wasm i32.load and i32.store instruction which I believe is unsigned, so it could be an llvm bug, or it could be undefined behaviour.

(See https://github.com/sunfishcode/wasm-reference-manual/blob/master/WebAssembly.md#store)

sbc100 commented 12 months ago

It seems that its not the i32.store offset. That looks correct. Here is the failing main function:

0000c8 func[1] <main>:
 0000c9: 23 00                      | global.get 0
 0000cb: 41 10                      | i32.const 16
 0000cd: 6b                         | i32.sub
 0000ce: 22 00                      | local.tee 0
 0000d0: 41 01                      | i32.const 1
 0000d2: 36 02 0c                   | i32.store 2 12
 0000d5: 20 00                      | local.get 0
 0000d7: 41 00                      | i32.const 0
 0000d9: 20 00                      | local.get 0
 0000db: 28 02 0c                   | i32.load 2 12
 0000de: 41 02                      | i32.const 2
 0000e0: 74                         | i32.shl
 0000e1: 6b                         | i32.sub
 0000e2: 36 02 08                   | i32.store 2 8
 0000e5: 20 00                      | local.get 0
 0000e7: 28 02 08                   | i32.load 2 8
 0000ea: 20 00                      | local.get 0
 0000ec: 28 02 0c                   | i32.load 2 12
 0000ef: 36 02 b4 08                | i32.store 2 1076
 0000f3: 41 00                      | i32.const 0
 0000f5: 0b                         | end

The store instruction which is the one that is trapping looks reasonable. We have i32.store 2 1076 where 1076 is &arr + 52 (arr lives at address 1024)

juj commented 12 months ago

Also, there is a double casting: (int)(int*)arr, that means "arr" is first cast to an integer pointer, and immediately after is cast to intege

Ops, sorry, you are correct. I see now, the many parentheses tripped me up and I interpreted the subtraction before the second cast.

Indeed it does look like something is off here. A smaller test case

int arr[2];
int main(){
  for (int i = 0; i < 2; ++i) {
    *(int*)((int)arr + 4 + -i) = 1;
  }
}

also fails, whereas

int arr[2];
int main(){
  for (int i = 0; i < 2; ++i) {
    *(int*)((int)arr + 4 -i) = 1;
  }
}

doesn't. So something looks fishy here.

angros47 commented 12 months ago

I can confirm that. Also, if doesn't file if compiled with -sWASM=0

So, the issue seems to be at WASM level

juj commented 12 months ago

Looks like -fsanitize=undefined also masks the issue.

int arr[2];
int main(){
  for (int i = 0; i < 2; ++i)
    *(int*)((int)arr + 4 + -(i<<2)) = 1;
}

passes with em++ a.cpp -o a.js -fsanitize=undefined, but fails if -fsanitize=undefined is removed.

juj commented 12 months ago

Simpler test case without a for loop:

int arr[2];
int i = 4;
int main() {
  *(int*)((int)arr + 4 + -i) = 1;
}
sbc100 commented 12 months ago

Oh nice, now the failing main function is just a few lines since its doesn't do __stack_pointer manipulation.

angros47 commented 12 months ago

Which browser are you using? I am using Firefox

sbc100 commented 12 months ago

You can repro in node on the command line, no need for a browser:

$ emcc test.c && node ./a.out.js
/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:64
   throw ex;
   ^

RuntimeError: memory access out of bounds
    at wasm://wasm/d2db571a:wasm-function[1]:0xd4
    at Module._main (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:475:90)
    at callMain (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:497:13)
    at doRun (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:521:21)
    at run (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:533:3)
    at runCaller (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:488:18)
    at removeRunDependency (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:235:4)
    at receiveInstance (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:334:3)
    at receiveInstantiationResult (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:339:3)
angros47 commented 12 months ago

I see. I just want to rule out any possible bug in the virtual machine itself

juj commented 12 months ago

Failing wasm: (*(int*)((int)arr + 4 + -i) = 1;)

 (func $__original_main (result i32)
  (local $0 i32)
  (local $1 i32)
  (local $2 i32)
  (local $3 i32)
  (local $4 i32)
  (local $5 i32)
  (local.set $0 (i32.const 0) )
  (local.set $1 (i32.load offset=65536 (local.get $0) ) )
  (local.set $2 (i32.const 0) )
  (local.set $3 (i32.sub (local.get $2) (local.get $1) ) )
  (local.set $4 (i32.const 1) )
  (i32.store offset=65544 (local.get $3) (local.get $4) )
  (local.set $5 (i32.const 0) )
  (return (local.get $5) )
 )

Passing wasm: (*(int*)((int)arr + 4 -i) = 1;)

 (func $__original_main (result i32)
  (local $0 i32)
  (local $1 i32)
  (local $2 i32)
  (local $3 i32)
  (local $4 i32)
  (local $5 i32)
  (local $6 i32)
  (local $7 i32)
  (local.set $0 (i32.const 0) )
  (local.set $1 (i32.load offset=65536 (local.get $0) ) )
  (local.set $2 (i32.const 65540) )
  (local.set $3 (i32.const 4) )
  (local.set $4 (i32.add (local.get $2) (local.get $3) ) )
  (local.set $5 (i32.sub (local.get $4) (local.get $1) ) )
  (local.set $6 (i32.const 1) )
  (i32.store (local.get $5) (local.get $6) )
  (local.set $7 (i32.const 0) )
  (return (local.get $7) )
 )
juj commented 12 months ago

How do you get the nice one-liner wasm disassembly @sbc100 ? wasm-dis gives that verbose output that's not that nice to diff.

sbc100 commented 12 months ago

If you compile test.c to test.o you can then link with -O2 to get a much smaller failing main (thanks for wasm-opt):

$ emcc --minify=0 -O2 test.o 

This gives:

 (func $1 (param $0 i32) (param $1 i32) (result i32)
  (i32.store offset=1032
   (i32.sub
    (i32.const 0)
    (i32.load
     (i32.const 1024)
    )
   )
   (i32.const 1)
  )
  (i32.const 0)
 )
sbc100 commented 12 months ago

To get linear dis-assembly I use wasm-objdump -d .

juj commented 12 months ago

Hmm, for me it doesn't fail with -O1 or higher.

juj commented 12 months ago

The first code ends up doing a memory write to memory address -4 with offset +65544. I wonder if that is something that the access bounds validator doesn't like, and it immediately rejects the base address -4 as invalid, without realizing the offset would push it above into a valid memory area?

sbc100 commented 12 months ago

Hmm, for me it doesn't fail with -O1 or higher.

You compile test.c with -O0 and then link it with -O2.. that fails:

$ emcc --minify=0 -O2 test.o && node ./a.out.js
/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:64
   throw ex;
   ^
RuntimeError: memory access out of bounds
    at wasm://wasm/d2db571a:wasm-function[1]:0xd4
    at Module._main (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:475:90)
    at callMain (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:497:13)
    at doRun (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:521:21)
    at run (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:533:3)
    at runCaller (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:488:18)
    at removeRunDependency (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:235:4)
    at receiveInstance (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:334:3)
    at receiveInstantiationResult (/usr/local/google/home/sbc/dev/wasm/emscripten/a.out.js:339:3)
juj commented 12 months ago

At least that's my takeaway from the crash..

Screen Shot 2023-09-27 at 2 59 33 AM
sbc100 commented 12 months ago

(note that I'm linking test.o .. compiled with -O0)

sbc100 commented 12 months ago

The first code ends up doing a memory write to memory address -4 with offset +65544. I wonder if that is something that the access bounds validator doesn't like, and it immediately rejects the base address -4 as invalid, without realizing the offset would push it above into a valid memory area?

I think the problem is that memory address -4 part. It looks like the engines always interpret both the address and the offset immediate as unsigned.. I wasn't aware of that but it looks like what is happening.

juj commented 12 months ago

The failure does occur equally in

(tested on macOS Catalina 10.15.7 (19H2))

So the behavior is either specced to work like that, or every VM got it wrong (seems unlikely).

I think the problem is that memory address -4 part.

Yeah, the following code does give out of bounds:

int main() { ((char*)-1)[13] = 1; }

whereas

int main() { ((char*)0)[12] = 1; }

doesn't raise an out of bounds error.

Disassembly for the failing case being

 (func $__original_main (result i32)
  (local $0 i32)
  (local $1 i32)
  (local $2 i32)
  (local.set $0 (i32.const -1) )
  (local.set $1 (i32.const 1) )
  (i32.store8 offset=13 (local.get $0) (local.get $1) ) // RuntimeError: memory access out of bounds
  (local.set $2 (i32.const 0) )
  (return (local.get $2) )
 )

It looks like the engines always interpret both the address and the offset immediate as unsigned..

Wouldn't that need a combination of i32->u32->u64 32-bit->64-bit widening to happen?

i.e. even if address was interpreted as u32, 0xFFFFFFFF, then adding 13 to it would still produce 12 as result? Only if it was interpreted as u32, then internally extended to u64 for the offset calculation (to generate 0x10000000C) would that be a problem?

Though I suppose since it generates a x86 mov instruction, the CPU will always treat both the base number and offset as u64.

juj commented 12 months ago

Even further simplified test case:

char arr[2];
char i = 1;
int main() {
  *(arr + 1 + -i) = 1;
}

which is quite reasonable code. Although for some reason, it generates very silly wasm code:

 (func $__original_main (result i32)
  (local $0 i32)
  (local $1 i32)
  (local $2 i32)
  (local $3 i32)
  (local $4 i32)
  (local $5 i32)
  (local $6 i32)
  (local $7 i32)
  (local $8 i32)
  (local.set $0 (i32.const 0) )
  (local.set $1 (i32.load8_u offset=65536 (local.get $0) ) )
  (local.set $2 (i32.const 24) )
  (local.set $3 (i32.shl (local.get $1) (local.get $2) ) )
  (local.set $4 (i32.shr_s (local.get $3) (local.get $2) ) )
  (local.set $5 (i32.const 0) )
  (local.set $6 (i32.sub (local.get $5) (local.get $4) ) )
  (local.set $7 (i32.const 1) )
  (i32.store8 offset=65541 (local.get $6) (local.get $7) )
  (local.set $8 (i32.const 0) )
  (return (local.get $8) )
 )

the

  (local.set $1 (i32.load8_u offset=65536 (local.get $0) ) )
  (local.set $2 (i32.const 24) )
  (local.set $3 (i32.shl (local.get $1) (local.get $2) ) )
  (local.set $4 (i32.shr_s (local.get $3) (local.get $2) ) )

part could be replaced by a

  (local.set $4 (i32.load8_s offset=65536 (local.get $0) ) )

seems that Wasm backend does not understand how to emit i32.load8_s? (unsigned char i = 1; would also generate noisy wasm, something I think should be fixed in LLVM side?)

juj commented 12 months ago

Looks like

char arr[2];
int i = 1;
int main() {
  *(arr + 1 + -i) = 1;
}

generates the simplest test case:

 (func $__original_main (result i32)
  (local $0 i32)
  (local $1 i32)
  (local $2 i32)
  (local $3 i32)
  (local $4 i32)
  (local $5 i32)
  (local.set $0 (i32.const 0) )                            // $0 = 0
  (local.set $1 (i32.load offset=65536 (local.get $0) ) )  // $1 = MEM8[65536] (== 1)
  (local.set $2 (i32.const 0) )                            // $2 = 0
  (local.set $3 (i32.sub (local.get $2) (local.get $1) ) ) // $3 = $2 - $1 = 0 - 1 = -1
  (local.set $4 (i32.const 1) )                            // $4 = 1
  (i32.store8 offset=65541 (local.get $3) (local.get $4) ) // MEM8[$3 + 65541] = $4; RuntimeError: memory access out of bounds
  (local.set $5 (i32.const 0) )
  (return (local.get $5) )
 )
kripken commented 12 months ago

On that last testcase, I don't know if it has undefined behavior in C, but looks like it crashes in -O0 and succeeds in -O1, so that seems possible. Or, it might be a wasm backend bug, if there isn't undefined behavior. So the main question here is first whether this has UB or not.

Looking at that -O0 output, I simplified it slightly in a safe way to this:

$ wasm-opt a.out.wasm --simplify-locals --coalesce-locals --reorder-locals --vacuum --extract-function=__original_main  --print 
extracting __original_main
(module $a.out.wasm
 (type $0 (func (result i32)))
 (memory $0 256 256)
 (data $.data (i32.const 65536) "\01\00\00\00")
 (export "__original_main" (func $__original_main))
 (func $__original_main (result i32)
  (i32.store8 offset=65541
   (i32.sub
    (i32.const 0)
    (i32.load offset=65536
     (i32.const 0)
    )
   )
   (i32.const 1)
  )
  (return
   (i32.const 0)
  )
 )
)

That code is expected to crash: the load returns 1, and then 0 - 1 is negative 1, and that is a large unsigned value. Adding the offset of 65541 does not prevent the crash, because load/store offsets work in "infinite size", that is, there is no overflow: we have u32(-1) + 65541 = 4295032836, which is larger than 32 bits, but the VM computes it in as many bits are needed (logically). It is larger than the size of memory, so it traps.

That is, i32.load offset=X (i32.const Y) is very different from i32.load (i32.add (i32.const X) (i32.const Y)). The former does not do a 32-bit overflow, while the latter does. (For that reason, it is not possible to simply replace one with the other as an optimization. Binaryen can do this only if it has additional information about overflows not being possible. That is, if no 32-bit overflow can happen then the two are identical.)

juj commented 12 months ago

On that last testcase, I don't know if it has undefined behavior in C

Are you referring to the test code at

char arr[2];
int i = 1;
int main() {
  *(arr + 1 + -i) = 1;
}

I am fairly positive that this should not have any undefined behavior. The only thing that comes to mind in the standard is that weird rule that pointer calculations should not exceed [begin,end] ranges (even if not dereferenced) - but here arr + 1 is inside the array, and the resulting arr + 1 + -i is also inside the array so not even that rule would be violated.

Looks like the following test case also generates an error:

char arr[2];
int i = -1;
int main() {
  *(arr + 1 + i) = 1;
}

(one can also remove the +1 there and still observe a crash, and while the resulting address should line up to a known ok value, that would be UDB C/C++ standards-wise)

For all intents and purposes, that application should be the same as

char arr[2];
int i = -1;
int main() {
  char *ptr = arr + 1 + i;
  *ptr = 1;
}

(which does not crash) I can't think of a reason that there should be UDB in either of these programs.

Let's invoke some more people to poll: @tlively @dschuff ?

This scenario seems like an incorrect reversal of a base_address + constant_offset memory load somewhere in codegen? In an uncommon fashion, here arr + 1 looks like it has the role of a the constant offset (since arr is global data), and i is the variable base address, so codegen incorrectly switches these two around?

because load/store offsets work in "infinite size", that is, there is no overflow

Although not sure how this could be fixed without pessimizing codegen. To me seems like load/store offsets would want to work with u64 sizes to take advantage of overflow(?)

kripken commented 12 months ago

Are you referring to the test code [..]?

Sorry, I should have said, I referred to the last testcase before my last comment.

juj commented 12 months ago

That code is expected to crash: the load returns 1, and then 0 - 1 is negative 1, and that is a large unsigned value.

I agree that wasm disassembly is expected to crash given the wasm semantics. It feels like the root issue would be that wasm code should not have gotten generated in the first place?

kripken commented 12 months ago

I agree, if there is no UB here then that looks like a wasm backend bug. I'm hoping one of the people you cc'd can confirm if there is or isn't UB here.

juj commented 11 months ago

Stumbled today on a video by Jonathan Müller (@foonathan), who is part of the standardization committee, and gave a presentation at C++ On Sea 2023 conference this June:

https://youtu.be/zGWj7Qo_POY?t=34

there he does mention that *(array+17) syntax is by definition equivalent to array[17], when array is a built-in array. I think that is the same situation that occurs in this bug.