Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Polybench mm benchmarks with Polly and Wasm backend lead to runtime error #49382

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50413
Status NEW
Importance P enhancement
Reported by Manuel Selva (manuel.selva@inria.fr)
Reported on 2021-05-20 00:49:16 -0700
Last modified on 2021-07-27 16:56:05 -0700
Version trunk
Hardware PC Linux
CC dschuff@google.com, llvm-bugs@lists.llvm.org, llvm@meinersbur.de, manuel.selva@inria.fr
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

I am compiling the polybench to WebAssembly using clang. Activating Polly in this compilation path lead to a runtime error when executing the generated wasm code for the 2mm, 3mm and gemm benchmarks.

Here is the error, basically the code tries to access memory out of what has been allocated :

╰─ wasmtime 2mm-wasm_polly 
Error: failed to run main module `2mm-wasm-polly`

Caused by:
    0: failed to invoke command default
    1: wasm trap: out of bounds memory access
       wasm backtrace:
           0:  0x3ec - <unknown>!polybench_alloc_data
           1:  0x491 - <unknown>!main
           2: 0x456a - <unknown>!__main_void
           3: 0x44e2 - <unknown>!__original_main
           4:  0x21a - <unknown>!_start
       note: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information

Note that the error occurs with all the WebAssembly runtimes that we tried.

Maybe the issue is not on the Polly side, but in the WebAssembly backend one. I didn't found a way to attach this request also to another product so that the people from the backend could be notified too. Please let me know how to do that.

Finally, here are some information to reproduce the bug. Please let me know if I can provide anything else that maybe useful for you to identify what is going on.

llvm commit : 1fbb484ea45f85740b7450b175096e5fcff6ecd9

compilation command (include and link options omitted) : clang -O3 -mllvm -polly -2mm.c -DLARGE_DATASET -DPOLYBENCH_TIME --target=wasm32-wasi -o 2mm-wasm-polly

Thank you for the support,

-- Manu

Quuxplusone commented 3 years ago
I tried to compile&run 2mm as WebAssembly, but stopped after this taking too
much time.

Judging from the backtrace, this doesn't seem to be an issue in Polly. The
crash occurs in polybench_alloc_data, which is implemented in polybench.c. It
doesn't even contain a loop, hence not a subject of optimizations by Polly.

Additional info that could help:
 * Output files of -polly-dump-before/-polly-dump-after
 * Output of -mllvm -debug-only=polly-detect,polly-scops,polly-opt-isl,polly-ast
 * Selectively optimizing specific functions, e.g. -polly-only-func=init_array, --polly-only-func=kernel_2mm or -polly-only-func=polybench_alloc_data
Quuxplusone commented 3 years ago
Hi Michael,

Thank you for your answer. As you suggest, I don't think the bug is on the
Polly side but more on the Wasm back-end one.

I'll provide tomorrow the additional information you suggested.

Also, how should I modify this bug entry so that Wasm backend people are aware
of it ?

--
Manu
Quuxplusone commented 3 years ago

You can change the Product to "libraries" and Component to "Backend: Webassembly". Doesn't hurt to add some developers + the Backend's code owner the CC list.

Alternatively, create a new bug entry for "Backend: Webassembly". When the faulty component is identified, the other bug entry can be set as a duplicate of the other one.

Try to reduce the reproducer. E.g. but everything into one file, remove functions/statements, generate LLVM-IR before or after optimizations.

Quuxplusone commented 3 years ago
(In reply to Michael Kruse from comment #1)
> I tried to compile&run 2mm as WebAssembly, but stopped after this taking too
> much time.
>
> Judging from the backtrace, this doesn't seem to be an issue in Polly. The
> crash occurs in polybench_alloc_data, which is implemented in polybench.c.
> It doesn't even contain a loop, hence not a subject of optimizations by
> Polly.
>
> Additional info that could help:
>  * Output files of -polly-dump-before/-polly-dump-after
>  * Output of -mllvm
> -debug-only=polly-detect,polly-scops,polly-opt-isl,polly-ast
>  * Selectively optimizing specific functions, e.g.
> -polly-only-func=init_array, --polly-only-func=kernel_2mm or
> -polly-only-func=polybench_alloc_data

Hi Michael,

I am investigating more on this issue, and I am not able to run some polly
passes because of the following kind of errors :

error in backend: Option -polly-dump-before not supported with NPM

I am running Polly directly from the clang driver, using -mllvm to specify
polly passes. I tried to set the old pass manager, but this changes the
behavior of the programs outputed by Polly. Do you confirm that I should stay
with the new pass manager ? If yes, how can I make passes such as polly-dump-
before work with the new pass manage ?

Thank you again for your help.
Quuxplusone commented 3 years ago

Support for -polly-dump-before with the NPM has recently been added: https://github.com/llvm/llvm-project/commit/29bef8e4e3593ab37c4d3b6289dcdec961c3fb52 Unfortunately, because of how extension points work in the NPM, it it only possible with -polly-position=early. An alternative is NPM's -print-before option (https://reviews.llvm.org/D87216). However, it only prints to dbgs() and only the function (not the entire module), both making getting a reproducer difficult.

Quuxplusone commented 3 years ago

Hi Michael,

I investigated more on this issue, and I came with the following "minimal" example that reproduce the problem :

#include <stdlib.h>

#define ni 800
#define nj 900
#define nk 1100

int main() {

  double (*A)[nk] = malloc(sizeof(double[ni][nk]));
  double (*B)[nj] = malloc(sizeof(double[nk][nj]));
  double (*D)[nj] = malloc(sizeof(double[ni][nj]));

  int i, j, k;
  for (i = 0; i < ni; i++) {
    for (j = 0; j < nj; j++){
      for (k = 0; k < nk; ++k)
        D[i][j] += A[i][k] * B[k][j];
    }
  }

  return 0;
}

I am on LLVM 1fbb484ea45f, and compiling this example (mm.c file) with the following line leads to a wasm file that I can execute :

clang-13 -O2 mm.c --sysroot wasi-sysroot --target=wasm32-wasi -o mm-wasm

Nevertheless, when I add Polly, the resulting wasm file leads to a runtime error.

clang-13 -O2 -mllvm -polly mm.c --sysroot wasi-sysroot --target=wasm32-wasi -o mm-polly-wasm

Also I have not been able to get the IR file associated to each one of these 2 versions.

Do you mind to try reproducing the issue ?

Thank,

-- Manu

Quuxplusone commented 3 years ago
I already tried to get wasm to compile and run but requires too many other
things (backend, libc replacement, javascript, wasi, ...) and even then I still
have no idea else I could do than you.

The minimal code looks like the gemm optimization would trigger this, where I
just fixed a bug: https://bugs.llvm.org/show_bug.cgi?id=50557
Also try switching off that optimization (-mllvm -polly-pattern-matching-based-
opts=0).

Does wasm limit the stack size? If yes, that might be the cause because above
optimization allocates temporary arrays on the stack.

More information could also be helpful. What is the error with the reduced code
(since there is no polybench_alloc_data)? What is the output with
WASMTIME_BACKTRACE_DETAILS=1? -mllvm -debug-only=polly-ast? -mllvm -polly-
codegen-add-debug-printing? -mllvm -polly-codegen-trace-scalars -mllvm -polly-
codegen-trace-stmts? Can you reduce the matrix size as well? Can you reproduce
it with the legacy pass manager? -polly-position=early? ....
Quuxplusone commented 3 years ago
Given that you are trapping in polybench_alloc_data, are you sure that you are
allowing memory to be allocated in the right way?
I tried this out using emscripten and node.js.

I'm not too familiar with wasi's clang and wasmtime, but in emscripten you need
to ensure you allocate a large enough wasm memory size (in emscripten the
default is 16M IIRC, which is not enough for this example). If you set -s
INITIAL_MEMORY=256MB or -s ALLOW_MEMORY_GROWTH=1 it does work, both with and
without polly.

With bare clang, you can set the initial and max memory with the linker's --
initial-memory and --max-memory flags (e.g. -Wl,--initial-memory=16777216 -Wl,--
max-memory=2147483648 ) and you can set the stack size with the -z stack-size
flag. Probably adding -v to your clang command line will show you what it uses
by default.