Closed easyaspi314 closed 4 years ago
Re-adding the specialization for defaultSecret
brings back the performance, but I am a little confused about why it is not running at exactly the same speed as withSecret
.
Additionally, just tail calling XXH3_64bits_withSecret
has the correct performance.
Thanks for investigating @easyaspi314 .
Indeed, this issue doesn't show up with gcc
.
Anyway, the proposed fix seems simple enough.
edit : strange, I don't see the impact,
neither with my own version of clang
on macosx
,
nor with clang v10.0
on ubuntu 20.04
...
edit 2: also tried clang v9.0.1
on ubuntu 20.04
, no impact either ...
edit 3: I can notice a ~10% impact with clang v8.0.1
, which is small enough to be attributed to other causes, such as instruction alignment
edit 4: switching to -O2
(instead of -O3
) in the hope to reproduce the issue. Nope, not successfully. Performance issue still not observed.
I was unable to reproduce the issue on my platforms, but went ahead and produced a fix nonetheless (#398).
It's a logical fix, so I presume it should fix this performance issue for platforms suffering from it.
I'm still interested in knowing if it solves the reported issue on your system.
It seems that there has been a large performance regression on Clang for x86_64 on the non-dispatched path. This does not affect GCC, only Clang apparently.
MacBookPro8,2 (15-inch, Early 2011) 2.0GHz Intel Core i7-2635QM (Sandy Bridge)
Clang 9.0.0 (macOS uses
-msse4.1
by default)dev
(dispatch disabled)1b14f648d63ddbf66a99e043c472789575c3673e
Currently investigating.