Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[MCA] Multiple Int schedulers #49319

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50350
Status NEW
Importance P enhancement
Reported by Roman Lebedev (lebedev.ri@gmail.com)
Reported on 2021-05-15 01:06:29 -0700
Last modified on 2021-05-15 13:36:15 -0700
Version trunk
Hardware PC Linux
CC andrea.dibiagio@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matdzb@gmail.com, matthew.davis@sony.com
Fixed by commit(s)
Attachments
Blocks PR32325
Blocked by
See also PR50353, PR50355

Currently, Znver3 models as-if the Int unit has a single scheduler encompassing all the 9 pipes. This isn't quite right.

I would like to model it as:

diff --git a/llvm/lib/Target/X86/X86ScheduleZnver3.td b/llvm/lib/Target/X86/X86ScheduleZnver3.td
index 84aee73bad63..a7d369f55b74 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver3.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver3.td
@@ -163,15 +163,23 @@ def Zn3IntegerPRF : RegisterFile<192, [GR64, CCR], [1, 1], [1, 0],
 // The schedulers can receive up to six macro ops per cycle, with a limit of
 // two per scheduler. Each scheduler can issue one micro op per cycle into
 // each of its associated pipelines
-// FIXME: these are 4 separate schedulers, not a single big one.
-def Zn3Int : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0, // scheduler 0
-                           Zn3ALU1, Zn3AGU1,          // scheduler 1
-                           Zn3ALU2, Zn3AGU2,          // scheduler 2
-                           Zn3ALU3,          Zn3BRU1  // scheduler 3
-                          ]> {
-  let BufferSize = !mul(4, 24);
+def Zn3IntSch0 : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0]> { // scheduler 0
+  let BufferSize = 24;
+}
+def Zn3IntSch1 : ProcResGroup<[Zn3ALU1, Zn3AGU1         ]> { // scheduler 1
+  let BufferSize = 24;
+}
+def Zn3IntSch2 : ProcResGroup<[Zn3ALU2, Zn3AGU2         ]> { // scheduler 2
+  let BufferSize = 24;
+}
+def Zn3IntSch3 : ProcResGroup<[Zn3ALU3,          Zn3BRU1]> { // scheduler 3
+  let BufferSize = 24;
 }

+def Zn3Int : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0,   // scheduler 0
+                           Zn3ALU1, Zn3AGU1,            // scheduler 1
+                           Zn3ALU2, Zn3AGU2,            // scheduler 2
+                           Zn3ALU3,          Zn3BRU1]>; // scheduler 3

 //===----------------------------------------------------------------------===//
 // Floating-Point Unit

The last Zn3Int is needed to silence the

FAILED: lib/Target/X86/X86GenSubtargetInfo.inc 
cd /builddirs/llvm-project/build-Clang12 && /builddirs/llvm-project/build-Clang12/bin/llvm-tblgen -gen-subtarget -I /repositories/llvm-project/llvm/lib/Target/X86 -I/builddirs/llvm-project/build-Clang12/include -I/repositories/llvm-project/llvm/include -I /repositories/llvm-project/llvm/lib/Target /repositories/llvm-project/llvm/lib/Target/X86/X86.td --write-if-changed -o lib/Target/X86/X86GenSubtargetInfo.inc -d lib/Target/X86/X86GenSubtargetInfo.inc.d
Included from /repositories/llvm-project/llvm/lib/Target/X86/X86.td:562:
/repositories/llvm-project/llvm/lib/Target/X86/X86ScheduleZnver3.td:136:1: error: proc resource group overlaps with Zn3IntSch0 but no supergroup contains both.
def Zn3AGU012 : ProcResGroup<[Zn3AGU0, Zn3AGU1, Zn3AGU2]>;
^

But as a result, the Zn3IntSch0/Zn3IntSch1/Zn3IntSch2/Zn3IntSch3 appear to not be used at all.

How would i approach this? The same problem exists for FP unit.

Quuxplusone commented 3 years ago
This is unfortunately a known design limitation.

The good news is that it is possible to fix it. The bad news is that it won't
be a simple fix, and it would definitely require a non trivial redesign of the
dispatch stage.

To fix this issue, we need to introduce the concept of "hierarchy of schedulers
resources". That hierarchy could be designed like a dag / a dominator-tree like
structure where nodes are resource groups, and each node has an immediate
dominator (i.e. an immediate super-group).

Scheduler resources will have to be fully pre-allocated at dispatch stage.
Technically speaking, this is fine, and it potentially allows for a more
accurate simulation of the dispatch logic in hardware.
When a scheduler resource S is selected, sub-groups (if any) are also selected
by doing a traversal of the dag. In the presence of multiple sub-groups, a
round-robin selector is used to pick the underlying sub-group instance.
The hierarchy of group resources is fixed, and can be pre-computed by the
dispatch stage at construction time.

In conclusion, it is technically possible to fix this issue. However, it is a
big feature, and it requires a significant redesign of the dispatch logic.