Open Quuxplusone opened 5 years ago
Load and store instructions are tracked by their corresponding queues from
dispatch until the "instruction executed" event.
Only when a load instruction reaches the 'Executed' stage, its value
becomes available to the users. At that point, the load no longer needs to
be tracked by the load queue.
For simplicity, we optimistically assume a similar behavior for store
instructions. However, on some target processors, store operation may not
leave the store queue until they reach the 'Retired' stage.
We should investigate on whether it is worthy to improve this too.
Load and store queue entries should only be released when memory operations
retire.
The current LSUnit design allows loads and stores to leave their corresponding
queues at issue stage. In reality, on processors that allow out-of-order
execution of memory operations, loads and stores tracked until retirement stage.
I am going to send a patch to change this.
On a slightly related topic: it would be nice in future to be able to model the
store buffer and introduce a very basic support for STLF (store to load
forwarding).
Patch uploaded for review here: https://reviews.llvm.org/D68266
Attached addps-measurements.txt
(14277 bytes, text/plain): bdver2 ADDPSrr/ADDPSrm exegesis measurements
addps-measurements.txt
(14277 bytes, text/plain)On some processors, load/store operations are split into multiple uOps. For example, X86 AMD Jaguar natively supports 128-bit data types, but not 256-bit data types. So, a 256-bit load is effectively split into two 128-bit loads, and each split load consumes one 'LoadQueue' entry. For simplicity, this class optimistically assumes that a load instruction only consumes one entry in the LoadQueue. Similarly, store instructions only consume a single entry in the StoreQueue.
In future, we should reassess the quality of this design, and consider alternative approaches that let instructions specify the number of load/store queue entries which they consume at dispatch stage.