Closed travisdowns closed 4 years ago
This should be fixed with the latest update.
Thanks!
In general would we expect the number on uops.info to be updated with each nanobench update? I assume you may not have access to all the machines, so I'm not sure.
I currently do have access to all the machines, so I re-ran the tests on all of them. However, there is no guarantee that this will still be the case with future updates.
Thanks, is there a place in the uops.info output we can look to see which version/build of nanobench was used?
No. However, the XML file contains the date when it was generated, which should make it possible to find the corresponding version.
Also, I should point out that nanoBench is just the tool that runs the microbenchmarks. The tool that generates them is not public yet. With "update" above I was referring to the update of the website.
Thanks Andreas.
On Thu, Nov 7, 2019 at 7:13 PM Andreas Abel notifications@github.com wrote:
No. However, the XML file contains the date when it was generated, which should make it possible to find the corresponding version.
Also, I should point out that nanoBench is just the tool that runs the microbenchmarks. The tool that generates them is not public yet. With "update" above I was referring to the update of the website.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/andreas-abel/nanoBench/issues/6?email_source=notifications&email_token=AASKZQKP5M7TTJRIUZ4XBHLQSSVJVA5CNFSM4HZ6AI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDOI5FI#issuecomment-551325333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKZQLZJ655LLPOTQE52Y3QSSVJVANCNFSM4HZ6AI5Q .
What is the definition of latency that you want to use exactly?
In particular, consider a hypothetical operation
foo arg1, arg2, arg3
which is3p0
. This uop will have a throughput of 3 due top0
pressure. Can this op have any latency less than 3? I think yes.For example, the op might only have a 1 cycle delay from
arg2
->arg1
, because the two uops only usesarg3
, and then the second uop usesarg2
andarg3
.However testing back-to-back
foo
ops will never show it because of the throughput limit. I think you are probably well aware of this since I notice lots of filler uops in tests, like:All the
movsxd
given enough breathing room to avoid lots of problems of this type.However, consider gathers. For 1->1 latency testing this is used:
No breathing room, so all these results just end up reporting the throughput number (5 in this case).
The following test:
also runs in 5 cycles, so we see the true 1->1 latency is 1 cycle.