Open ralfbiedert opened 6 years ago
Unless I'm mistaken this doesn't look like an easy fix: std::simd
types such as float32x4
seem to impl
functions like abs()
directly. That means we can't easily re-implement them in traits, as by default the std::simd
variant will be picked.
The solutions I see:
vektor
. Upside: Should fix all these issues. Downside: Lots of work. Might introduce other issues.faster::abs
and similar functions provided by std::simd
. Upside: Cleaner code, let std::simd
do the work. Downside: Currently apparently much slower than faster
. std::simd
issue and request abs
and similar should reside in trait
, similar to faster
. Upside: Allows custom implementation and composition. Downside: if std::simd::abs
actually works as intended might not be needed. Will put this on hold until further discussed (but will PR some current improvements made in process).
Hm, I do want faster to seamlessly integrate with std::simd
(as they have much more manpower than I do), but I can definitely see these name conflicts becoming a long-term problem as std::simd
adds features - especially because I likely won't know about them until rustc starts warning me.
A good argument for moving away from those types would be that we could implement std::simd's traits on our types, but that'd be a manual process and I'd likely have more work to do to get it compiling on stable.
While working on #47 I noticed what looks like performance regressions in the
cargo bench
, in particular functions likemap_simd
andmap_scalar
, but quite a few others.However, comparing #49 to the commit before the refactoring, the numbers are mostly unchanged.
I then assumed it's related to unfortunate default feature flags on my machine, but playing with
avx2
andsse4.1
didn't have any effect either. I also have a first implementation of #48, and it actually looks like no fallbacks are emitted formap_simd
. (Tried to cross check that withradare2
, but have some problems locating the right symbol / disassembly for the benchmarks). Lastly, the functionsmap_scalar
andmap_simd
differ a bit, but even when I make them equal (e.g.,sqrt
vs.rsqrt
) the difference remains.rustc
became so good in auto-vectorization?tests::map_simd
andtests::map_scalar
?Running on
rustc 1.29.0-nightly (9fd3d7899 2018-07-07)
, MBP 2015, i7-5557U.Update: I linked the latest faster version from my SVM library and I don't see these problems in 'production':
Update 2 Seems to be related to some intrinsics. When I dissect the benchmark, I get
I now think that each intrinsic should have its own benchmark, e.g.
intrinsic_abs_scalar
,intrinsic_abs_simd
, ...Update 3 ... oh boy. I think that by "arcane magic" Rust imports and prefers
std::simd::f32x4
and friends over thefaster
types and methods.So when you do
my_f32s.abs()
, it callsstd::simd::f32x4::abs
, notfaster::arch::current::intrin::abs
.The reason I think that's the problem is you can now easily do
my_f32s.sqrte()
, which isn't implemented infaster
, but instd::simd
.What's more annoying is that it doesn't warn about any collision, and that
std::simd
is actually slower than "vanilla" Rust.TODO:
#![feature(stdsimd)]
except inlib.rs
Update 4 Now one more thing makes sense ... I sometimes got
use of unstable library feature 'stdsimd'
in test cases and I didn't understand why. Probably because that's where thestd::simd
built-ins were used.