Open DD-L opened 8 years ago
进一步测试发现:
session_local.cpp:start 函数
在
this->resolver_right.async_resolve({server_name, server_port},
boost::bind(&session::resolve_handler, shared_from_this(), _1, _2));
语句上下执行 rsa 加密都没问题,可是在绑定的回调函数 session::resolve_handler
中执行 rsa 加密, 就会出问题。
resolver_right.async_resolve()
这是 local 端第一个 boost.asio 异步回调绑定。
这种情况下,基本已经确定了是 cryptopp 库的问题,而不是我的代码封装的问题。
cryptopp 出错代码处:
https://github.com/weidai11/cryptopp/blob/master/rijndael.cpp#L231
void Rijndael::Base::UncheckedSetKey(const byte *userKey, unsigned int keylen, const NameValuePairs &)
{
AssertValidKeyLength(keylen);
m_rounds = keylen/4 + 6;
m_key.New(4*(m_rounds+1));
word32 *rk = m_key;
#if (CRYPTOPP_BOOL_AESNI_INTRINSICS_AVAILABLE && (!defined(_MSC_VER) || _MSC_VER >= 1600 || CRYPTOPP_BOOL_X86 || CRYPTOPP_BOOL_X32))
// MSVC 2008 SP1 generates bad code for _mm_extract_epi32() when compiling for X64
if (HasAESNI())
{
static const word32 rcLE[] = {
0x01, 0x02, 0x04, 0x08,
0x10, 0x20, 0x40, 0x80,
0x1B, 0x36, /* for 128-bit blocks, Rijndael never uses more than 10 rcon values */
};
const word32 *rc = rcLE;
__m128i temp = _mm_loadu_si128((__m128i *)(void *)(userKey+keylen-16)); // <--- 程序死在这里
memcpy(rk, userKey, keylen);
while (true)
{
rk[keylen/4] = rk[0] ^ _mm_extract_epi32(_mm_aeskeygenassist_si128(temp, 0), 3) ^ *(rc++);
rk[keylen/4+1] = rk[1] ^ rk[keylen/4];
rk[keylen/4+2] = rk[2] ^ rk[keylen/4+1];
rk[keylen/4+3] = rk[3] ^ rk[keylen/4+2];
if (rk + keylen/4 + 4 == m_key.end())
break;
if (keylen == 24)
{
rk[10] = rk[ 4] ^ rk[ 9];
rk[11] = rk[ 5] ^ rk[10];
temp = _mm_insert_epi32(temp, rk[11], 3);
}
else if (keylen == 32)
{
temp = _mm_insert_epi32(temp, rk[11], 3);
rk[12] = rk[ 4] ^ _mm_extract_epi32(_mm_aeskeygenassist_si128(temp, 0), 2);
rk[13] = rk[ 5] ^ rk[12];
rk[14] = rk[ 6] ^ rk[13];
rk[15] = rk[ 7] ^ rk[14];
temp = _mm_insert_epi32(temp, rk[15], 3);
}
else
temp = _mm_insert_epi32(temp, rk[7], 3);
rk += keylen/4;
}
if (!IsForwardTransformation())
{
rk = m_key;
unsigned int i, j;
std::swap(*(__m128i *)(void *)(rk), *(__m128i *)(void *)(rk+4*m_rounds));
for (i = 4, j = 4*m_rounds-4; i < j; i += 4, j -= 4)
{
temp = _mm_aesimc_si128(*(__m128i *)(void *)(rk+i));
*(__m128i *)(void *)(rk+i) = _mm_aesimc_si128(*(__m128i *)(void *)(rk+j));
*(__m128i *)(void *)(rk+j) = temp;
}
*(__m128i *)(void *)(rk+i) = _mm_aesimc_si128(*(__m128i *)(void *)(rk+i));
}
return;
}
#endif
...
看样子是处理器相关的东西
发现了一个有用的链接:
摘录下来就是:
We get the error when compiling on the E5-2680, and copying to the X5690.
Oh, that's interesting. Try adding
-mtune=pentium4
toCXXFLAGS
. The Intel Xeon E5-2680 has the AVX instruction set; while the Intel Xeon X5690 only has SSE 4.2. Crypto++ uses the the double quadword multiply (PCLMULQDQ
) andAES-NI
instructions from the AVX instruction set (which the other processor lacks).If
-mtune=pentium4
does not work, then you are going to have to disable viaCRYPTOPP_BOOL_AESNI_INTRINSICS_AVAILABLE
. Now that I think about it, that's what you are probably going to have to do since this is a compile time feature selection, and not a runtime feature selection.
re: had to just stick with CRYPTOPP_BOOL_AESNI_INTRINSICS_AVAILABLE......
于是在编译 cryptopp
静态库时 mingw32-make MAKE=mingw32-make CXXFLAGS="-DNDEBUG -g2 -O2 -mtune=pentium4"
发现有用
但是在 test 用例中同样的的代码就没问题: