WojciechMula / sse4-strstr

SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
http://0x80.pl/articles/simd-strfind.html
BSD 2-Clause "Simplified" License
239 stars 29 forks source link

clang produces different results than gcc-compiled program #8

Closed WojciechMula closed 6 years ago

WojciechMula commented 6 years ago

GCC compilation - all procedures (except AVX2-wide) reports the same reference result

naive scalar                            ... reference result = 8108076510, time =   6.299609 s
std::strstr                             ... reference result = 8108076510, time =   0.659882 s
SWAR 64-bit (generic)                   ... reference result = 8108076510, time =   1.446615 s
SWAR 32-bit (generic)                   ... reference result = 8108076510, time =   2.529733 s
SSE2 (generic)                          ... reference result = 8108076510, time =   0.498816 s
SSE4.1 (MPSADBW)                        ... reference result = 8108076510, time =   0.640781 s
SSE4.1 (MPSADBW unrolled)               ... reference result = 8108076510, time =   0.961995 s
SSE4.2 (PCMPESTRM)                      ... reference result = 8108076510, time =   1.373412 s
SSE (naive)                             ... reference result = 8108076510, time =   1.960058 s
AVX2 (MPSADBW)                          ... reference result = 8108076510, time =   0.578520 s
AVX2 (generic)                          ... reference result = 8108076510, time =   0.374598 s
AVX2 (naive)                            ... reference result = 8108076510, time =   1.147053 s
AVX2 (naive unrolled)                   ... reference result = 8108076510, time =   0.795070 s
AVX2-wide (naive)                       ... reference result = 8107771150, time =   0.541654 s

MPSADBW variants in clang compilation have different values:

naive scalar                            ... reference result = 8108076510, time =   6.293796 s
std::strstr                             ... reference result = 8108076510, time =   0.660113 s
SWAR 64-bit (generic)                   ... reference result = 8108076510, time =   1.334720 s
SWAR 32-bit (generic)                   ... reference result = 8108076510, time =   2.518706 s
SSE2 (generic)                          ... reference result = 8108076510, time =   0.489896 s
SSE4.1 (MPSADBW)                        ... reference result = 5713208130, time =   1.787850 s
SSE4.1 (MPSADBW unrolled)               ... reference result = 7962617290, time =   0.985689 s
SSE4.2 (PCMPESTRM)                      ... reference result = 8108076510, time =   1.448608 s
SSE (naive)                             ... reference result = 8108076510, time =   1.946516 s
AVX2 (MPSADBW)                          ... reference result = 8108076510, time =   0.694087 s
AVX2 (generic)                          ... reference result = 8108076510, time =   0.353279 s
AVX2 (naive)                            ... reference result = 8108076510, time =   1.054814 s
AVX2 (naive unrolled)                   ... reference result = 8108076510, time =   0.795445 s
AVX2-wide (naive)                       ... reference result = 8107771150, time =   0.577752 s
WojciechMula commented 6 years ago

The problem disappeared with in the recent version.