Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[llvm-mca] Investigate how to improve the load/store queue usage simulation in LSUnit. #38802

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR39830
Status NEW
Importance P enhancement
Reported by Andrea Di Biagio (andrea.dibiagio@gmail.com)
Reported on 2018-11-28 10:57:19 -0800
Last modified on 2021-08-20 04:56:52 -0700
Version trunk
Hardware PC Windows NT
CC andrea.dibiagio@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matthew.davis@sony.com
Fixed by commit(s)
Attachments addps-measurements.txt (14277 bytes, text/plain)
Blocks
Blocked by
See also PR51557

On some processors, load/store operations are split into multiple uOps. For example, X86 AMD Jaguar natively supports 128-bit data types, but not 256-bit data types. So, a 256-bit load is effectively split into two 128-bit loads, and each split load consumes one 'LoadQueue' entry. For simplicity, this class optimistically assumes that a load instruction only consumes one entry in the LoadQueue. Similarly, store instructions only consume a single entry in the StoreQueue.

In future, we should reassess the quality of this design, and consider alternative approaches that let instructions specify the number of load/store queue entries which they consume at dispatch stage.

Quuxplusone commented 5 years ago
Load and store instructions are tracked by their corresponding queues from
dispatch until the "instruction executed" event.
Only when a load instruction reaches the 'Executed' stage, its value
becomes available to the users. At that point, the load no longer needs to
be tracked by the load queue.

For simplicity, we optimistically assume a similar behavior for store
instructions.  However, on some target processors, store operation may not
leave the store queue until they reach the 'Retired' stage.

We should investigate on whether it is worthy to improve this too.
Quuxplusone commented 4 years ago
Load and store queue entries should only be released when memory operations
retire.

The current LSUnit design allows loads and stores to leave their corresponding
queues at issue stage. In reality, on processors that allow out-of-order
execution of memory operations, loads and stores tracked until retirement stage.
I am going to send a patch to change this.

On a slightly related topic: it would be nice in future to be able to model the
store buffer and introduce a very basic support for STLF (store to load
forwarding).
Quuxplusone commented 4 years ago

Patch uploaded for review here: https://reviews.llvm.org/D68266

Quuxplusone commented 4 years ago

Attached addps-measurements.txt (14277 bytes, text/plain): bdver2 ADDPSrr/ADDPSrm exegesis measurements