Open Quuxplusone opened 5 years ago
Bugzilla Link | PR41275 |
Status | NEW |
Importance | P enhancement |
Reported by | Roman Lebedev (lebedev.ri@gmail.com) |
Reported on | 2019-03-28 07:31:44 -0700 |
Last modified on | 2019-03-29 10:54:04 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | clement.courbet@gmail.com, gchatelet@google.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
---
mode: latency
key:
instructions:
- 'BT32rr R11D R11D'
- 'RCR8rCL R11B R11B'
config: ''
register_initial_values:
- 'R11D=0x0'
- 'R11B=0x0'
- 'CL=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: latency, value: 5.7795, per_snippet_value: 11.559 }
error: ''
info: Repeating two instructions
assembled_snippet:
41BB0000000041B300B100450FA3DB41D2DB450FA3DB41D2DB450FA3DB41D2DB450FA3DB41D2DB450FA3DB41D2DB450FA3DB41D2DB450FA3DB41D2DB450FA3DB41D2DBC3
...
---
mode: latency
key:
instructions:
- 'RCR8rCL R12B R12B'
config: ''
register_initial_values:
- 'R12B=0x0'
- 'CL=0x0'
- 'EFLAGS=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000
measurements:
- { key: latency, value: 11.288, per_snippet_value: 11.288 }
error: ''
info: Repeating a single implicitly serial instruction
assembled_snippet:
415441B400B1004883EC08C7042400000000C7442404000000009D41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC41D2DC415CC3
...
Shot-into-the-dark:
If we have instruction A with latency x, and instruction B with latency L1.
We know the latency L1. We are looking for latency x.
But we also know the latency of *serially* executing instruction A and
instruction B - latency L12;
Given that the execution was serial, is it correct to compute latency of
instruction A
as x = L12 - L1 ?
I.e. the real latency of BT32rr is 11.559-11.288 = 0.271 == 1?
I.e. the real latency of BT32rr is 11.559-11.288 = 0.271 == 1?
That is correct. At one point Guillaume (cced) was looking into forming a hierarchy of measurements to make sure that we always had the latency for the back-to-back instruction.
Indeed. I had one version of llvm-exegesis which computed the dependency graph but since it evaluated everything upfront it would take a lot more time to execute.
We ended up not offering this option to keep it simple and assumed that it would be best to solve this as a post process.
(In reply to Clement Courbet from comment #2)
> > I.e. the real latency of BT32rr is 11.559-11.288 = 0.271 == 1?
>
> That is correct.
Aha! So the target latency of the instruction is per_snippet_value - (sum of
actual latencies of other instructions in that snippet).
> At one point Guillaume (cced) was looking into forming a
> hierarchy of measurements to make sure that we always had the latency for
> the back-to-back instruction.
Anything i should be aware of? I suspect this might the next big issue
i have with llvm-exegesis, that i'd like to resolve/to be resolved..