DCIT / perl-CryptX

https://metacpan.org/pod/CryptX
Other
35 stars 23 forks source link

Incorrect decected AES-NI: LTC_AMD64_SSE4_1 => LTC_HAS_AES_NI: "ltc/ciphers/aes/aesni.c:74:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes" #97

Closed VVD closed 1 year ago

VVD commented 1 year ago

FreeBSD 13.2 amd64, CPU Bloomfield and Lynnfield (1st generation of Nehalem 45nm, examples: Core i7 920/930/etc, Core i5 750/760, Core i7 860/870, Xeon X34xx, Xeon E55xx), LLVM 14.0.5, -march=nehalem.

Lynnfield Xeon X3430 with SSE4.1 and without AES-NI:

# cpuid-etallen | grep -iE 'aes|xeon|sse4.1'|head -n5
      SSE4.1 extensions                       = true
      AES instruction                         = false
      VAES instructions                        = false
   brand = "Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz"
   (synth) = Intel Xeon Processor 3400 (Lynnfield B1) [Nehalem] {Nehalem}, 45nm

Bloomfiled Core i7 930 with SSE4.1 and without AES-NI:

# cpuid-etallen | grep -iE 'aes|i7|sse4.1'|head -n5
      SSE4.1 extensions                       = true
      AES instruction                         = false
      VAES instructions                        = false
   brand = "Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz"
   (synth) = Intel Core i7-900 (Bloomfield D0) [Nehalem] {Nehalem}, 45nm

Westmere Xeon E5620 with support of the AES-NI:

# cpuid-etallen | grep -iE 'aes|xeon|sse4.1'|head -n5
      SSE4.1 extensions                       = true
      AES instruction                         = true
      VAES instructions                        = false
   brand = "Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz"
   (synth) = Intel Xeon Processor 3600 / 5600 (Westmere-EP B1) [Westmere] {Nehalem}, 32nm

In src/ltc/headers/tomcrypt_private.h turn on AES_NI if detected SSE4.1, but it's incorrect:

#if defined(LTC_AES_NI) && defined(LTC_AMD64_SSE4_1)
#define LTC_HAS_AES_NI
#endif

And have build error:

--- config ---
--- src/liballinone.a ---
--- ltc/ciphers/aes/aesni.o ---
cc -Iltm -Iltc/headers -DLTC_SOURCE -DLTC_NO_TEST -DLTC_NO_PROTOTYPES -DLTM_DESC -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DUSE_THREAD_SAFE_LOCALE -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -DPIC -fPIC -O2 -pipe -march=nehalem -fstack-protector-strong -fno-strict-aliasing   -DARGTYPE=4 -c ltc/ciphers/aes/aesni.c -o ltc/ciphers/aes/aesni.o
--- ltc/ciphers/anubis.o ---
cc -Iltm -Iltc/headers -DLTC_SOURCE -DLTC_NO_TEST -DLTC_NO_PROTOTYPES -DLTM_DESC -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DUSE_THREAD_SAFE_LOCALE -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -DPIC -fPIC -O2 -pipe -march=nehalem -fstack-protector-strong -fno-strict-aliasing   -DARGTYPE=4 -c ltc/ciphers/anubis.c -o ltc/ciphers/anubis.o
--- ltc/ciphers/aes/aesni.o ---
ltc/ciphers/aes/aesni.c:74:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[4] = rk[0] ^ setup_mix(temp, 3) ^ rcon[i];
                         ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:89:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[6] = rk[0] ^ setup_mix(temp, 3) ^ rcon[i];
                         ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:108:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[8] = rk[0] ^ setup_mix(temp, 3) ^ rcon[i];
                         ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:116:27: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[12] = rk[4] ^ setup_mix(temp, 2);
                          ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:145:14: error: always_inline function '_mm_aesimc_si128' requires target feature 'aes', but would be inlined into function 'aesni_setup' that is compiled without support for 'aes'
      temp = temp_invert(rk);
             ^
ltc/ciphers/aes/aesni.c:30:24: note: expanded from macro 'temp_invert'
#define temp_invert(k) _mm_aesimc_si128(*((__m128i*)(k)))
                       ^
ltc/ciphers/aes/aesni.c:146:26: error: always_inline function '_mm_aesimc_si128' requires target feature 'aes', but would be inlined into function 'aesni_setup' that is compiled without support for 'aes'
      *((__m128i*) rk) = temp_invert(rrk);
                         ^
ltc/ciphers/aes/aesni.c:30:24: note: expanded from macro 'temp_invert'
#define temp_invert(k) _mm_aesimc_si128(*((__m128i*)(k)))
                       ^
6 errors generated.
*** [ltc/ciphers/aes/aesni.o] Error code 1

make[2]: stopped in /tmp/work/usr/ports/security/p5-CryptX/work/CryptX-0.079/src
1 error

make[2]: stopped in /tmp/work/usr/ports/security/p5-CryptX/work/CryptX-0.079/src
*** [src/liballinone.a] Error code 2

make[1]: stopped in /tmp/work/usr/ports/security/p5-CryptX/work/CryptX-0.079
--- CryptX.c ---
mv CryptX.xsc CryptX.c
karel-m commented 1 year ago

Can you please try CryptX-0.80?

VVD commented 1 year ago

clang detects AES-NI from skylake only and doesn't detect it from westmere to broadwell (sandybridge, ivybridge, haswell):

$ for CPU in westmere sandybridge ivybridge haswell broadwell skylake; do echo $CPU; echo | clang -march=$CPU -dM -E - | grep -i aes; done
westmere
sandybridge
ivybridge
haswell
broadwell
skylake
#define __AES__ 1

On other side it's possible to force turn on AES-NI even if march doesn't support it:

$ echo | clang -march=core2 -maes -dM -E - | grep -i aes
#define __AES__ 1

As a simple workaround we can check #if defined(__AES__1) instead of the SSE4.1. And also add build option to force add -maes if user know his CPU support it.

karel-m commented 1 year ago

ping @sjaeckel

VVD commented 1 year ago

I see -msse4.1 -maes in command line and my -march=nehalem, but it build fine without errors now (0.080).

karel-m commented 1 year ago

Does the test suite of 0.080 show something like this?

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64  BASE64-URL-SAFE .....
                                       ^^^^^^
VVD commented 1 year ago

Does the test suite of 0.080 show something like this?

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64  BASE64-URL-SAFE .....
                                       ^^^^^^

How to run it? :-)

karel-m commented 1 year ago

make test

VVD commented 1 year ago

On nehalem CPU:

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64
All tests successful.
Files=137, Tests=18568, 29 wallclock secs ( 2.24 usr  0.21 sys + 26.30 cusr  1.99 csys = 30.74 CPU)
Result: PASS
karel-m commented 1 year ago

It means that AES-SNI support: 1/ was properly compiled 2/ was detected during runtime

In theory you can build a binary with AES-NI support but run it on a CPU lacking AES-SNI support (which will be detected in runtime and fall back to SW implementation of AES)

sjaeckel commented 1 year ago

In theory you can build a binary with AES-NI support but run it on a CPU lacking AES-SNI support (which will be detected in runtime and fall back to SW implementation of AES)

TBH I never tested that since I don't have any amd64 CPUs w/o AES support, but maybe it's a good time now? ;)

VVD commented 1 year ago

So it use SW emulation of the AES-NI on my Nehalem?

sjaeckel commented 1 year ago

Yes, it should fall back to the SW implementation and therefore be a valid test of that functionality.

karel-m commented 1 year ago

If you see

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64

it uses HW AES, if you see

# Various others:  ARGTYPE=4  ADLER32  BASE64

it uses sw implementation of AES.

sjaeckel commented 1 year ago

If you see

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64

it uses HW AES, if you see

not completely right.

If you see AES-NI it supports HW AES, but maybe uses SW AES if the HW doesn't support it.

karel-m commented 1 year ago

@sjaeckel I am adding compiler options -msse4.1 -maes for gcc/clang on x86_64. Are both necessary or should I use only -aes?

I am afraid that without -msse4.1 the __SSE4_1__ will not be defined and AES-NI detection in tomcrypt_cfg.h will not work.

sjaeckel commented 1 year ago

IIRC they're both required since they enable different parts of the API

karel-m commented 1 year ago

OK, I will keep both. Just FYI

$ echo | gcc -dM -E - | grep -e AES -e SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1

$ echo | gcc -maes -dM -E - | grep -e AES -e SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1
#define __AES__ 1

$ echo | gcc -maes -msse4.1 -dM -E - | grep -e AES -e SSE
#define __SSE4_1__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __AES__ 1
#define __SSE3__ 1
karel-m commented 1 year ago

@VVD to sum it up for you:

If you see in the test suite output this

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64

it is the maximum you can achieve during building the CryptX module. You have built a binary that is ready to use HW AES-NI (or fall back to SW implementation when the CPU running this binary does not support AES-NI).

VVD commented 1 year ago

Thanks! 0.080 build fine and tests passed on both CPUs with SSE4.1 - with and without AES-NI support.