JuliaHubOSS / llvm-cbe

resurrected LLVM "C Backend", with improvements
Other
811 stars 138 forks source link

Simplification of vector handling: importance of helper functions? #159

Open hikari-no-yume opened 2 years ago

hikari-no-yume commented 2 years ago

I've been thinking about ways the C backend could be simplified. I think a lot of complexity comes from trying to handle so many details of the translation in a single pass. By splitting things into multiple passes (operating on an IR, probably LLVM IR), maybe it could be easier to work with.

Something that could be moved to a pass is the handling of vector operations. The LLVM Scalarizer pass can lower most vector operations to simple scalar operations for us, meaning we can remove the handling for vector addition, multiplication etc, leaving just things like generating structs for them, converting GEPs, and a few other things like that.

It's a pretty simple change to use the scalariser:

--- a/lib/Target/CBackend/CTargetMachine.cpp
+++ b/lib/Target/CBackend/CTargetMachine.cpp
@@ -19,6 +19,8 @@
 #include "llvm/Transforms/Utils.h"
 #endif

+#include "llvm/Transforms/Scalar/Scalarizer.h"
+
 namespace llvm {

 bool CTargetMachine::addPassesToEmitFile(PassManagerBase &PM,
@@ -53,6 +55,8 @@ bool CTargetMachine::addPassesToEmitFile(PassManagerBase &PM,
   // Lower atomic operations to libcalls
   PM.add(createAtomicExpandPass());

+  PM.add(createScalarizerPass());
+
   PM.add(new llvm_cbe::CWriter(Out));
   return false;
 }

The main difference in the generated C code is essentially that what would otherwise be the body of a helper function like llvm_fmul_f32x4 instead gets inlined at the call-site.

I'm wondering whether there's a disadvantage to this approach. For a simple matrix multiplication test I wrote, clang seemed to produce similarly good code for the the helper function and non-helper-function versions (i.e. it successfully re-vectorises both). But it might be the case that in a more complex program, switching to scalarisation like this would produce worse code.

Any thoughts?