Open doe300 opened 6 years ago
Can you assign me to the issue? Now I am trying to implement list scheduling. The algorithm is as follows:
or a, b, b
to v8min a, b, b
if another add-op instruction can be issued, and fuse themThis scheduling can be applied for one basic block, not for across basic blocks. This implementation is relatively simple and understandable.
The problem is, the heuristics that now we adopt to choose an instruction from ready-queue. Probably, we need to care memory latency and register-file or so on. This must be constructed in trial and error...
Preparation: create DAG (Dependency Graph)
You could re-use the vc4c::Graph
class, if it suits your needs. It is already used for the basic-block dependency graph and the colored graph for register mapping. Also, the class DebugGraph
provides an easy way to print the contents of the graph into graphviz file.
Question: I am wondering when VC4C translate SSA to non-SSA before register-allocation. Come to think of it, Instruction Scheduling fits non-SSA, and equal to an assembly except that is it not register allocated.
It may be possible to implement the scheduler for SSA-form language, but a few improvable points may be remain.
The SSA from the intermediate language (LLVM-IR or SPIR-V) is relaxed very early (before the optimization steps are run) to resolve phi-instructions (function eliminatePhiNodes
). So the internal representation of the optimization steps (up to the register-mapping) is SSA-ish (most of the instructions are SSA, but it is not guaranteed that a local is only written once).
Intermediate status update:
Some statistics (based on TestVC4C --test-emulator
):
Not sure how to implement this, but here a few notes on an instruction scheduler:
Goal
Reorder instructions within a basic block to utilize the delay introduced by certain operations by inserting meaningful instructions minimizing the number of cycles spent waiting (via
nop
or on periphery registers).Target features
Implementation
See also the Wikipedia.