Xilinx / llvm-aie

Fork of LLVM to support AMD AIEngine processors
Other
103 stars 12 forks source link

Support for software pipelining in pragmas. #126

Open aidansander opened 3 months ago

aidansander commented 3 months ago

I'm compiling a simple kernel using peano. Manually software pipelining the attached kernel (dut_pipelined.cc) yields considerable speedup compared to using pipelining pragmas (dut_pragma.cc). Without manual pipelining, the produced assembly does not pipeline and the kernel runs in ~1800 cycles. With manual pipelining, the kernel runs in ~1000 cycles. The clang loop min_iteration_count and max_iteration_count pragmas have no effect on the produced assembly. dut_pragma.cc dut_pipelined.cc

konstantinschwarz commented 3 months ago

Hi @aidansander, could you also provide the lut_based_ops.h header to be able to reproduce your results?

aidansander commented 3 months ago

Sure thing. lut_based_ops.h and lut_based_ops.cpp (which holds the LUT values) are both included from here. I'm running the kernel using llvm-lit on some tests. The manually pipelined and loop pragma tests have the steps I used to run and measure the cycle count.

konstantinschwarz commented 3 months ago

Thanks! One thing to consider: LoopUnroll runs before MachinePipeliner, i.e. if you completely unroll the loop (through #pragma unroll), there is nothing to do for the pipeliner.

There are a few things we are still lacking though to get this software pipelined:

With these tweaks, I could get a SWP loop with 23 cycles. It should be possible to further improve on that. FYI @gbossu @martien-de-jong @andcarminati

konstantinschwarz commented 2 months ago

Update: 1. & 2. have been resolved. Last point - MachineMemOperands for VLDB.4x instructions - still needs to be done.