Support for software pipelining in pragmas.

aidansander commented 4 months ago

I'm compiling a simple kernel using peano. Manually software pipelining the attached kernel (dut_pipelined.cc) yields considerable speedup compared to using pipelining pragmas (dut_pragma.cc). Without manual pipelining, the produced assembly does not pipeline and the kernel runs in ~1800 cycles. With manual pipelining, the kernel runs in ~1000 cycles. The clang loop min_iteration_count and max_iteration_count pragmas have no effect on the produced assembly. dut_pragma.cc dut_pipelined.cc

konstantinschwarz commented 4 months ago

Hi @aidansander, could you also provide the lut_based_ops.h header to be able to reproduce your results?

aidansander commented 4 months ago

Sure thing. lut_based_ops.h and lut_based_ops.cpp (which holds the LUT values) are both included from here. I'm running the kernel using llvm-lit on some tests. The manually pipelined and loop pragma tests have the steps I used to run and measure the cycle count.

konstantinschwarz commented 4 months ago

Thanks! One thing to consider: LoopUnroll runs before MachinePipeliner, i.e. if you completely unroll the loop (through #pragma unroll), there is nothing to do for the pipeliner.

There are a few things we are still lacking though to get this software pipelined:

Enable hardware loops by default, to turn an up-counting loop into a down-counting loop. Currently blocked by the next item.
Teach the MachinePipeliner to understand our hardware loop construct (we have an open PR for this https://github.com/Xilinx/llvm-aie/pull/125)
VLDB4X instructions don't carry MachineMemOperands, pessimizing alias analysis results in the MachinePipeliner and rejecting the loop.

With these tweaks, I could get a SWP loop with 23 cycles. It should be possible to further improve on that. FYI @gbossu @martien-de-jong @andcarminati

konstantinschwarz commented 3 months ago

Update: 1. & 2. have been resolved. Last point - MachineMemOperands for VLDB.4x instructions - still needs to be done.

Xilinx / llvm-aie

Support for software pipelining in pragmas. #126