Closed joerowell closed 3 years ago
I created an autoconf
-ified version of this PR here https://github.com/fplll/g6k/tree/avx2-fix-autoconf
It passes tests on strombenzin which doesn't have AVX2.
@cr-marcstevens and @lducas do you object to autotool
-ification? If not, we'd merge my branch into Joe's and thus this PR.
No objections here, I think a better build system is certainly a good idea! (and was already a bit overdue)
No objection either.
Okay, merged autoconf stuff into this PR
Is the consensus that this is ready to merge?
Note that bootstrap.sh includes building g6k using python setup.py build_ext
,
while README says to use python setup.py build
.
What's the difference between these two?
@cr-marcstevens did your push fix your question or is that still open?
That's still open.
I also found another minir issue: running configure and then setup.py doesn't actually rebuild unless you also do make yourself.
That's still open.
I also found another minir issue: running configure and then setup.py doesn't actually rebuild unless you also do make yourself.
Isn't that the typical behaviour: ./configure && make
doesn't necessarily rebuild? That is, it's not a setup.py
thing, is it?
Updated README on this, so fine to merge with me.
What's the problem?
This PR attempts to resolve both #72 and #81.
For the unawares/future reference: the problem is that the introduction of the AVX2 specific instructions hamstrings us on non-AVX2 platforms.
Interestingly, #72 has been silently fixed: in fact, G6K compiles on the machine mentioned in #72. However, G6K will not work by default on that machine, because of a subtle bug. In particular, the switch from
-march=native
to-mavx2 -maes
means that all machines will compile G6K, but some may not be able to run it. This leads to the error message seen in #81.To my understanding, the reason why #81 occurs is because of compiler optimisations: the compiler produces code before the call to BDGL that uses AVX2 instructions. This would explain why G6K crashes, even when using the triple-sieve.
What does this PR do?
__builtin_cpu_supports
intrinsic. This comes at no further incompatibility cost, since we already use gcc/clang intrinsics. In more detail, the Siever now has a const variablesupports_bdgl
that is initialised at construction. All calls to thebdgl_sieve
from the Cython layer checks this variable inside an assertion.__AVX2__
flag is not set at compile-time.In other words, this is essentially the solution sketched by @cr-marcstevens in #72.
Please note that this PR is just a draft. This is because at present the default compiler flags specify
-mavx2
-- in other words, this PR does not (by itself) fix the issues mentioned in #81. I am sure that this can be solved via a configure script, but I do not know how to implement this.We should also test this on more machines: I have tried this on the aforementioned machine (and on an older, Sandy Bridge machine), but this should be more widely tested before merging.
What else could we do?
I have a half-working implementation of a non-AVX2 specific bucketer that could in theory replace the AVX2 bucketer, but I have not yet finished it (nor do I believe that I'll have time to in the next few weeks). This might be a more preferable, long-term solution.