AnyDSL / thorin2

The Higher ORder INtermediate representation - next gen
https://anydsl.github.io/thorin2/
MIT License
46 stars 9 forks source link

%affine.For breaks without scalerize pass #244

Closed y-exp closed 1 year ago

y-exp commented 1 year ago

Example: bug.thorin

The Thorin function foo computes the sequence 0, 1, ..., 99 as a pack «100; %core.I32» via a simple %affine.For counting loop. Initially, the pack ‹100; 0:%core.I32› is passed to this loop and updated as an accumulator acc between iterations. Inside the function bar, said pack is finally stored at some ptr: %mem.Ptr0.

.plugin core;
.plugin affine;

.let Arr = «100; %core.I32»;

.fun foo []: Arr =
    .con body [i: %core.I32, acc: Arr, yield: .Cn Arr] =
        .let `acc = .ins (acc, %core.conv.u 100 i, i);
        yield acc;

    .let loop = %affine.For (%core.i32, 1, Arr);
    loop (0:%core.I32, 100:%core.I32, 1:%core.I32, ‹100; 0:%core.I32›, body, return);

.fun .extern bar [mem: %mem.M, ptr: %mem.Ptr0 Arr]: %mem.M =
    .ret arr  = foo $ ();
    .let `mem = %mem.store (mem, ptr, arr);
    return mem;

Example: bug.c

In addition to that, the following C program calls bar with a pointer to an array and prints its contents to stdout.

#include <stdio.h>
#include <stdint.h>

extern void bar(uint32_t* arr);

int main() {
    uint32_t arr[100] = {0};
    bar(arr);

    for (size_t i = 0; i < 100; i++)
        printf("%i ", arr[i]);
    printf("\n");

    return 0;
}

Commands

thorin bug.thorin --output-ll bug.ll
clang -o test bug.c bug.ll -Wno-override-module
./test

However, this only produces the correct output when the scalerize pass is run which flattens the pack inside the loop. In case it is skipped (e.g. %compile.scalerize_threshold is exceeded), the output contains garbage instead.

Looking at the generated LLVM IR code, it seems acc is not updated correctly when jumping between BBs that form the loop.

leissa commented 1 year ago

Problem is that the extractvalues are placed in a wrong block, thereby violating a basic SSA property. I'm really astonished that clang silently accepts this garbage and removes it entirely. Maybe a debug build has more sanity checks.

leissa commented 1 year ago

So the problem seems to be that %affine.lower_for_pass creates a couple of things, that the LLVM backend doesn't like. I think the best solution is to rewrite the pass to create a more straightforward loop. I'm on it. Could also speed up a couple of things if you use lots of nested %affine.Fors.