Open Quuxplusone opened 4 years ago
Bugzilla Link | PR47380 |
Status | NEW |
Importance | P normal |
Reported by | Eliana Xie(谢洁) (eliana.x@huawei.com) |
Reported on | 2020-09-01 03:28:01 -0700 |
Last modified on | 2021-03-04 05:29:27 -0800 |
Version | trunk |
Hardware | Other Linux |
CC | andrea.dibiagio@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matthew.davis@sony.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
The bottleneck analysis is not a "critical-path" analysis. The analysis is
conducted at simulation time; it is purely based on the observation of so-
called "pressure increase" events, usually generated by a Scheduler component.
Pressure events are generated only in two situations:
1. hardware pipeline utilisation could be increased if instructions weren't
subject to data dependencies.
2. There are instructions ready to execute. However pipelines are fully
booked, and the number of instructions dispatched during that cycle was bigger
than the number of instructions issued on the underlying pipes.
Essentially: point 1. is about data dependencies limiting the issue throughput.
Point 2. is instead about pipeline resources being unavailable, and a too low
issue rate (despite instructions are free from data dependencies).
Therefore, not all a data dependencies are necessarily seen as "problematic"
for the purpouse of this analysis. Only those that limit the issue throughput
are problematic.
Back to your example: it may be that those dependencies are not problematic
during the first ten iterations of the loop. Those may introduce problems if
the number of iterations is increased.
Your timeline only shows that there are data dependencies. Nothing more. The
effects of those dependencies on the throughput may only becoming apparent if
you increase the number of iterations to something more than 10.
During that short simulation, the scheduler was probably still able to extract
enough ILP and feed the underlying pipes.
Over time, problematic dependencies would induce an increase in back-pressure
on the scheduler buffers, eventually leading to compulsory stalls. It may be
that 10 iterations wasn't enough to reach that critical point.
Generally speaking, when doing bottleneck analysis (or throughput analysis in
general), it is strongly advised to use a large number of iterations. If
possible, I recommend to stick with the default (i.e. 100 iterations) unless
there are compelling reasons for doing it differently.