Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

MachineScheduler/PostRAScheduler have quadratic compile time complexity #49553

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50584
Status NEW
Importance P enhancement
Reported by Roman Lebedev (lebedev.ri@gmail.com)
Reported on 2021-06-04 15:20:26 -0700
Last modified on 2021-08-13 01:06:38 -0700
Version trunk
Hardware PC Linux
CC andrea.dibiagio@gmail.com, david.bolvansky@gmail.com, florian_hahn@apple.com, jay.foad@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matdzb@gmail.com, piotr.sobczak@amd.com, qshanz@cn.ibm.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR11302, PR51460
A continuation of https://bugs.llvm.org/show_bug.cgi?id=50384

Relevant tests:
https://www.dropbox.com/s/3awj7pibbag1du6/test-512.ll?dl=0
https://www.dropbox.com/s/el1f3awkfzxv8qz/test-1024.ll?dl=0

where n is is LoopMicroOpBufferSize in X86ScheduleZnver3.td
(aka the desired unrolled loop size)

$ time ./bin/llc -time-passes -o /dev/null ../test-512.ll
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 11.5713 seconds (11.5714 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   4.7281 ( 41.6%)   0.0011 (  0.6%)   4.7292 ( 40.9%)   4.7294 ( 40.9%)  Post RA top-down list latency scheduler
   3.1045 ( 27.3%)   0.0000 (  0.0%)   3.1045 ( 26.8%)   3.1046 ( 26.8%)  Machine Instruction Scheduler
<...>

$ time ./bin/llc -time-passes -o /dev/null ../test-1024.ll
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 60.8318 seconds (60.8329 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  31.6539 ( 52.3%)   0.0030 (  0.9%)  31.6570 ( 52.0%)  31.6576 ( 52.0%)  Post RA top-down list latency scheduler
  21.5208 ( 35.6%)   0.0080 (  2.3%)  21.5288 ( 35.4%)  21.5294 ( 35.4%)  Machine Instruction Scheduler
<...>

So doubling the amount of unrolling (and thus doubling the IR size),
results in 6x increase in compile time, for those parameters.
Quuxplusone commented 3 years ago
Does anyone familiar with MachineScheduler/PostRAScheduler have any thoughts
on the subject? Is this an inherent design problem,
or is this something that can be fixed?
Quuxplusone commented 3 years ago

I don't have much to add except that "perf record / perf report" shows llvm::SUnit::addPred pretty high up on the profile. I have seen that before. The last time I looked into it it seemed to be something to do with the way the barrier chain is handled in ScheduleDAGInstrs, but I never got to the bottom of it.

Quuxplusone commented 3 years ago

_Bug 51460 has been marked as a duplicate of this bug._