Closed joerowell closed 2 years ago
Looks good to me, but maybe @cr-marcstevens and @lducas can take a look too?
Neat trick, happy to learn about the Simd lib. Experiments looks good to me, no regression noted. I tried adding an option to control the flag in rebuild.sh.
This PR adds a non-AVX2 specific BDGL bucketer to G6K.
It turns out that GCC ships with some (minimal) extensions for vector programming. This means that it's possible to write vectorised code in C++ without referring directly to any platform-specific intrinsics.
This PR adds two new files:
Simd.h.
This file contains a new namespaceSimd
and a bunch of declarations for vector instructions. These are essentially just emulations of what the Intel intrinsics offer, plus some extra helper functions. Where applicable (i.e ifHAVE_AVX2
is set) this wrapper uses the AVX2 instructions as before: if not we let GCC generate vectorised code for us.Simd.inl
. This file contains all of the implementations forSimd.h.
The PR also modifies
fht_lsh.h
andfht_lsh.cpp
: the only changes are to do all calls through the simd wrapper.There does not appear to be much of a change speed-wise: the intrinsics appear to be roughly the same speed (with some exceptions:
m128_broadcastsi128_si256
for example is much slower when using the GCC extensions vs native AVX2 code). However, it's possible there's a machine somewhere where the drop-off is more noticeable than on my laptop.Note that you can find this with full unit tests etc at https://github.com/joerowell/gcc-bucketer: I used GoogleTest to write these, but that doesn't seem to play well with AutoTools and for now I haven't included them.