Closed Lastique closed 1 year ago
@pdimov Peter, MSVC-14.0 fails with ICE in this CI run in is_contiguous_range
implementation. Could you take a look please?
I'll look into it.
Merging #138 (86b1ae5) into develop (9df4da9) will not change coverage. The diff coverage is
n/a
.:exclamation: Current head 86b1ae5 differs from pull request most recent head c7e0a70. Consider uploading reports for the commit c7e0a70 to get more accurate results
Ping @jeking3.
Rebased and updated the code to prefer movdqu starting with SSE4.1. It doesn't matter on CPUs not supporting AVX, but it is possible that SSE4.1 code will run on a modern CPU that does prefer movdqu to lddqu.
Why not just use movdqu
everywhere?
Because lddqu is better on NetBurst CPUs. And there's also a workaround for MSVC codegen bug.
NetBurst CPUs
Really? Have you seen one recently (as in, in the last decade)? :-)
I had a Thinkpad with a P4D I gave away, its battery lasted about half an hour.
Well, I'm fine with dropping support for NetBurst CPUs, but as I said, there's also MSVC bug, so the code wouldn't get much simpler anyway.
It's still a simplification even if we still pretend to care about VS 2008. Not many parts of Boost still work with it, because it's not tested. (I still test msvc-9.0 on Appveyor in old libraries as a matter of habit but that's more of an exception and is not going to last.)
The bug is only fixed in VS2015; VS2013 and before are affected.
Anyway, I've pushed a commit to use movdqu universally, except for the MSVC workaround.
Yes but without the VS2008 path, it's a single ifdef over _ReadWriteBarrier.
Ok, done. I don't want to remove the VS2008 workaround yet.
Here's some discussion on lddqu vs movdqu, for reference: https://community.intel.com/t5/Intel-ISA-Extensions/LDDQU-vs-MOVDQU-guidelines/m-p/1178965
Yeah, I forgot about that discussion, thanks for digging it up. Although it didn't result in a definitive answer - Intel reps didn't comment. In the end I was left with the opinion I had when I started it - use lddqu
up until AVX, use vmovdqu
with AVX and later, as before AVX lddqu
is not worse than movdqu
and is sometimes better.
@pdimov Since apparently Boost.UUID is no longer actively maintained again, maybe you could merge this? The Codecov failure does not seem to be caused by this.
I was going to wait until after the release, but I can merge it now if you insist.
No, after the release is fine. Thanks.
A gentle reminder about this PR.
Prefer
vmovdqu
tovlddqu
on CPUs supporting AVX.vlddqu
has one extra cycle latency on Skylake and later Intel CPUs and is not merged to the following instructions as a memory operand, which makes the code slightly larger. Legacy SSE3lddqu
is still preferred because it is faster on Prescott and the same asmovdqu
on AMD CPUs. It also doesn't affect code size becausemovdqu
cannot be converted to a memory operand as memory operands are required to be aligned in SSE.Closes https://github.com/boostorg/uuid/issues/137.
Also, re-format the test code for MSVC bug 981648, no functional changes.