fplll / g6k

The General Sieve Kernel
GNU General Public License v2.0
99 stars 30 forks source link

Add non-AVX2 specific BDGL bucketing #101

Closed joerowell closed 2 years ago

joerowell commented 2 years ago

This PR adds a non-AVX2 specific BDGL bucketer to G6K.

It turns out that GCC ships with some (minimal) extensions for vector programming. This means that it's possible to write vectorised code in C++ without referring directly to any platform-specific intrinsics.

This PR adds two new files:

The PR also modifies fht_lsh.h and fht_lsh.cpp: the only changes are to do all calls through the simd wrapper.

There does not appear to be much of a change speed-wise: the intrinsics appear to be roughly the same speed (with some exceptions: m128_broadcastsi128_si256 for example is much slower when using the GCC extensions vs native AVX2 code). However, it's possible there's a machine somewhere where the drop-off is more noticeable than on my laptop.

Note that you can find this with full unit tests etc at https://github.com/joerowell/gcc-bucketer: I used GoogleTest to write these, but that doesn't seem to play well with AutoTools and for now I haven't included them.

malb commented 2 years ago

Looks good to me, but maybe @cr-marcstevens and @lducas can take a look too?

lducas commented 2 years ago

Neat trick, happy to learn about the Simd lib. Experiments looks good to me, no regression noted. I tried adding an option to control the flag in rebuild.sh.