Closed luoming17 closed 3 years ago
Hi luoming17,
For performances reasons a lot of NSIMD code is not compiled into the .so file. So when you compile your code with NSIMD you must specify which SIMD extension to use. The CMake SIMD variable is only there when testing the library and does not impact the compilation of your own code.
Moreover because of #26 NSIMD does not try to guess the SIMD extensions on its own. So in the CMake responsible for compiling your own code you should add something like add_compile_options(-DSSE2 -DFMA -msse2 -mfma)
.
Hi gquintin,
Thank you for your answer, I add this complie option and it works well. But when I testing load2a function, I found that it acted differently between int8_t and int16_t.
I have some raw aligned data generated by these code.
for (int i = 0; i < bufLen; i++) {
buffer[i] = i % 100;
}
My test code is as follow.
template<typename BaseType>
static void testFn(char* originBuf, char* destBuf) {
using PackType = nsimd::packx2<BaseType>;
auto pack = nsimd::load2a<PackType>(reinterpret_cast<BaseType*>(originBuf));
nsimd::storea(reinterpret_cast<BaseType*>(destBuf), pack.v0);
}
int main(void) {
// some codes initial buffer and destBuf
testFn<int8_t>(buffer, destBuf);
dump_int8(destBuf); // 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28...
testFn<int16_t>(buffer, destBuf);
dump_int8(destBuf); // 0, 1, 4, 5, 8, 9, 12, 13, 32, 33, 36, 37, 40, 41, 44, 45, 16, 17, 20, 21, 24, 25, 28, 29,
// 48, 49, 52, 53, 56, 57, 60, 61, 64, 65, 68, 69, 72, 73, 76, 77, 96, 97...
}
The result of int8_t met my expectation, but int16_t didn't.
// my expectation
// 0, 1, 4, 5, 8, 9, 12, 13, 16, 17, 20, 21, 24, 25, 28, 29, 32, 33, 36, 37, 40, 41, 44, 45, 48, 49,
// 52, 53, 56, 57, 60, 61, 64, 65, 68, 69, 72, 73, 76, 77, 80, 81...
I think load3a and load4a also have this behavior. Does nsimd have any APIs that can make the results meet my expectations?
Hey! I'm using this repository to optimize my code, but I don't know how to use 128-bit registers in my program. I'm compiling and runing in Linux, and my CPU is "Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz". I'm sure it supports SSE2. here are my operations.
I use this command to generate files first.
My CMakeList.txt is like this.
My project code is like this.
The output is "8". Did it means this pack only support 64bit data? How can I get a pack that support 128bit or more? Thank you.