Closed VVD closed 1 year ago
Can you please try CryptX-0.80?
clang detects AES-NI from skylake only and doesn't detect it from westmere to broadwell (sandybridge, ivybridge, haswell):
$ for CPU in westmere sandybridge ivybridge haswell broadwell skylake; do echo $CPU; echo | clang -march=$CPU -dM -E - | grep -i aes; done
westmere
sandybridge
ivybridge
haswell
broadwell
skylake
#define __AES__ 1
On other side it's possible to force turn on AES-NI even if march doesn't support it:
$ echo | clang -march=core2 -maes -dM -E - | grep -i aes
#define __AES__ 1
As a simple workaround we can check #if defined(__AES__1)
instead of the SSE4.1.
And also add build option to force add -maes
if user know his CPU support it.
ping @sjaeckel
I see -msse4.1 -maes
in command line and my -march=nehalem, but it build fine without errors now (0.080).
Does the test suite of 0.080 show something like this?
# Various others: ARGTYPE=4 ADLER32 AES-NI BASE64 BASE64-URL-SAFE .....
^^^^^^
Does the test suite of 0.080 show something like this?
# Various others: ARGTYPE=4 ADLER32 AES-NI BASE64 BASE64-URL-SAFE ..... ^^^^^^
How to run it? :-)
make test
On nehalem CPU:
# Various others: ARGTYPE=4 ADLER32 AES-NI BASE64
All tests successful.
Files=137, Tests=18568, 29 wallclock secs ( 2.24 usr 0.21 sys + 26.30 cusr 1.99 csys = 30.74 CPU)
Result: PASS
It means that AES-SNI support: 1/ was properly compiled 2/ was detected during runtime
In theory you can build a binary with AES-NI support but run it on a CPU lacking AES-SNI support (which will be detected in runtime and fall back to SW implementation of AES)
In theory you can build a binary with AES-NI support but run it on a CPU lacking AES-SNI support (which will be detected in runtime and fall back to SW implementation of AES)
TBH I never tested that since I don't have any amd64 CPUs w/o AES support, but maybe it's a good time now? ;)
So it use SW emulation of the AES-NI on my Nehalem?
Yes, it should fall back to the SW implementation and therefore be a valid test of that functionality.
If you see
# Various others: ARGTYPE=4 ADLER32 AES-NI BASE64
it uses HW AES, if you see
# Various others: ARGTYPE=4 ADLER32 BASE64
it uses sw implementation of AES.
If you see
# Various others: ARGTYPE=4 ADLER32 AES-NI BASE64
it uses HW AES, if you see
not completely right.
If you see AES-NI
it supports HW AES, but maybe uses SW AES if the HW doesn't support it.
@sjaeckel I am adding compiler options -msse4.1 -maes
for gcc/clang on x86_64. Are both necessary or should I use only -aes
?
I am afraid that without -msse4.1
the __SSE4_1__
will not be defined and AES-NI detection in tomcrypt_cfg.h
will not work.
IIRC they're both required since they enable different parts of the API
OK, I will keep both. Just FYI
$ echo | gcc -dM -E - | grep -e AES -e SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1
$ echo | gcc -maes -dM -E - | grep -e AES -e SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1
#define __AES__ 1
$ echo | gcc -maes -msse4.1 -dM -E - | grep -e AES -e SSE
#define __SSE4_1__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __AES__ 1
#define __SSE3__ 1
@VVD to sum it up for you:
If you see in the test suite output this
# Various others: ARGTYPE=4 ADLER32 AES-NI BASE64
it is the maximum you can achieve during building the CryptX module. You have built a binary that is ready to use HW AES-NI (or fall back to SW implementation when the CPU running this binary does not support AES-NI).
Thanks! 0.080 build fine and tests passed on both CPUs with SSE4.1 - with and without AES-NI support.
FreeBSD 13.2 amd64, CPU Bloomfield and Lynnfield (1st generation of Nehalem 45nm, examples: Core i7 920/930/etc, Core i5 750/760, Core i7 860/870, Xeon X34xx, Xeon E55xx), LLVM 14.0.5, -march=nehalem.
Lynnfield Xeon X3430 with SSE4.1 and without AES-NI:
Bloomfiled Core i7 930 with SSE4.1 and without AES-NI:
Westmere Xeon E5620 with support of the AES-NI:
In src/ltc/headers/tomcrypt_private.h turn on AES_NI if detected SSE4.1, but it's incorrect:
And have build error: