AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

Fix broken LLVM IR and verifyModule usage #104

Closed Hugobros3 closed 3 years ago

Hugobros3 commented 3 years ago

This PR includes a fix for a subtle bug caused by how we generate code for constants (in the Thorin, def.cpp::is_const sense) in the LLVM target. The idea behind it, to quote the code itself:

we emit all Thorin constants in the entry block, since they are not part of the schedule

The issue with this is how that is accomplished: the IR for building such constants is inserted at the start of the entry basic block for the currently emitted function, on the assumption there are no dependencies for such nodes and they are not part of the schedule. Which is correct in the Thorin sense since aggregates like definite arrays are just constant values, but in LLVM terms they are not and are instead build with an alloca and emitting the code that puts data inside the alloca after the code that creates it (because it's emitted first) is a clear issue. One example of code that triggers this is presented below:

enum E {
    A,
    B
}

#[export]
fn f() = [[E::A ; 2] ; 1];

The emitted IR (before these patches!) for this uses %15 before its definition, which is a clear violation of the basic rules for SSA form. Curiously enough though, no assertions were violated even though I had Artic and Thorin both built in Debug. (but not LLVM, but that should not matter since I'm not debugging LLVM). Worse still, this broken IR was not even caught by clang which instead silently ignored the issue (!) when fed the illegal IR file. Using llc and opt on the other hand, the problem was caught correctly:

Instruction does not dominate all uses!
  %15 = load { { {} }, i8 }, { { {} }, i8 }* %tmp_alloca1
  store { { {} }, i8 } %15, { { {} }, i8 }* %7
Instruction does not dominate all uses!
  %15 = load { { {} }, i8 }, { { {} }, i8 }* %tmp_alloca1
  store { { {} }, i8 } %15, { { {} }, i8 }* %8
artic: /home/gobrosse/anydsl/thorin/src/thorin/be/llvm/llvm.cpp:531: std::unique_ptr<llvm::Module>& thorin::CodeGen::emit(int, bool): Assertion `!broken' failed.
Abandon (core dumped)

I'm not sure what that is about with clang not seeing the issue (running some cleanup on the IR before validation ?), but I did find the reason why even with THORIN_ENABLE_CHECKS we did not catch the issue. The documentation for llvm::verifyModule actually specifies that a true boolean is returned upon verification failure, and we do not check for that, rendering that call useless. We do check for that when we call verifyModule in vectorize.cpp, and we dump the module when vectorization fails, here I just made that an assertion though.

In addition to this issue I included a fix for the C interface generation (which was the root cause of the bug I had in my toy chess program), where an enum type with only unit payloads would still get a struct { ... } data; field in its C binding, causing that field to be 1-sized in practice (ISO C disallows empty structs altogether), and messing up the data layout. There are still other similar issues with types like () having inconsistent sizes between Thorin and C, so further work in that area might be necessary.

I leave this PR as a draft for now since I have to run all tests for stincilla and rodent again. Done

leissa commented 3 years ago

I'll come back to this when we've merge the llvm_rewrite branch.

Hugobros3 commented 3 years ago

Ehrm ... I took the verifyModule fix out to put it on master right away, and rewrote my PR branch to not include it since I pushed a squashed version to master ... and apparently that counts as closing the PR to github. The actual fix still needs to be pulled

madmann91 commented 3 years ago

@leissa The llvm_cleanup branch now incorporates these changes as well. A test case for this issue has been added to Impala.