Demo produces different results in DEBUG and non-DEBUG mode

zzxx-husky commented 4 years ago

Hi, this work is very interesting to me and I wanna try this in one of our project, where I need to frequently answer whether a vertex (i.e., an integer) exists in the adjacency list (i.e., a sorted integer array) of another vertex.

However, I find a problem when running the demo example in my personal laptop. If I compile in non-DEBUG mode, the demo compains about incorrect result, while if I compile in DEBUG mode, the demo runs successfully and correctly. The problem seems bin_searcher.scalar(...) does not return correct results under non-DEBUG mode.

zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ git log -n 1
commit ce8e7fe1801968170b866430cfef1939ee99ae23 (HEAD -> master, origin/master, origin/HEAD)
Author: fabiocannizzo <fabio_cannizzo@yahoo.com>
Date:   Thu Aug 8 13:17:33 2019 +0800

    update links in readme
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ make all
platform=Linux
compiler=g++
g++ -c -std=c++0x -I../include -mfma -mavx2 -O3 demo.cpp -o bin/demo.o
g++  bin/demo.o -o bin/demo
rm bin/demo.o
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ ./bin/demo
scalar results:
4 <= 1 < 5
incorrect result!zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ DEBUG=1 make all
platform=Linux
compiler=g++
make: Nothing to be done for 'all'.
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ touch Makefile
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ DEBUG=1 make all
platform=Linux
compiler=g++
g++ -c -std=c++0x -I../include -mfma -mavx2 -g -DDEBUG demo.cpp -o bin/demo.o
g++  bin/demo.o -o bin/demo
rm bin/demo.o
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ ./bin/demo
scalar results:
1 <= 1 < 2
2 <= 2 < 4
4 <= 4 < 5
5 <= 5 < 9
1 <= 1.5 < 2
2 <= 2.5 < 4
4 <= 4.8 < 5
5 <= 8.2 < 9
vectorial results:
1 <= 1 < 2
2 <= 2 < 4
4 <= 4 < 5
5 <= 5 < 9
1 <= 1.5 < 2
2 <= 2.5 < 4
4 <= 4.8 < 5
5 <= 8.2 < 9
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$

fabiocannizzo commented 4 years ago

Thanks for reporting this issue. Note that the library is specialized for floating point numbers, not integers. Said that, it should be possible to adapt most of the methods to work with integers.

Could you please post the output of: cat /proc/cpuinfo

Could you please also try if the error persist setting SIMDLEVEL to AVX or SSE2?

Thanks

zzxx-husky commented 4 years ago

Could you please post the output of: cat /proc/cpuinfo

Yes. I tried the demo example on two different machines and demo behaved in the same way on the two machines, i.e., incorrect results in non-DEBUG mode and correct results in DEBUG mode. Here is the /proc/cpuinfo (I select cpuinfo of one of the processors as all the processors on the same machine share the same cpuinfo):

One machine (WSL1 on Win10):

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 158
model name  : Intel(R) Core(TM) i5-9400 CPU @ 2.90GHz
stepping    : 10
microcode   : 0xffffffff
cpu MHz     : 2904.000
cache size  : 256 KB
physical id : 0
siblings    : 6
core id     : 0
cpu cores   : 6
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 6
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave osxsave avx f16c rdrand lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt ibrs ibpb stibp ssbd
bogomips    : 5808.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

The other machine (Debian):

processor       : 47
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
stepping        : 4
microcode       : 0x2000050
cpu MHz         : 2600.000
cache size      : 19712 KB
physical id     : 1
siblings        : 24
core id         : 11
cpu cores       : 12
apicid          : 55
initial apicid  : 55
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5201.36
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Could you please also try if the error persist setting SIMDLEVEL to AVX or SSE2?

I tried setting SIMDLEVEL to AVX or SSE2 but that does not solve the problem:

zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ git log -n 1
commit ce8e7fe1801968170b866430cfef1939ee99ae23 (HEAD -> master, origin/master, origin/HEAD)
Author: fabiocannizzo <fabio_cannizzo@yahoo.com>
Date:   Thu Aug 8 13:17:33 2019 +0800

    update links in readme
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ git diff
diff --git a/source/demo/Makefile b/source/demo/Makefile
index 192b4aa..95f69fb 100644
--- a/source/demo/Makefile
+++ b/source/demo/Makefile
@@ -38,6 +38,7 @@ ifeq ($(CC),g++)
 endif

+$(info ***** SIMDLEVEL: $(SIMDLEVEL) *****)
 ifndef SIMDLEVEL
    CFLAGS += -mfma -mavx2
 else ifeq ($(SIMDLEVEL), FMA)
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ SIMDLEVEL=AVX make all
platform=Linux
compiler=g++
***** SIMDLEVEL: AVX *****
g++ -c -std=c++0x -I../include -mavx -O3 demo.cpp -o bin/demo.o
g++  bin/demo.o -o bin/demo
rm bin/demo.o
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ ./bin/demo
scalar results:
4 <= 1 < 5
incorrect result!zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ touch Makefile && SIMDLEVEL=SSE2 make all
platform=Linux
compiler=g++
***** SIMDLEVEL: SSE2 *****
g++ -c -std=c++0x -I../include -msse2 -O3 demo.cpp -o bin/demo.o
g++  bin/demo.o -o bin/demo
rm bin/demo.o
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ ./bin/demo
scalar results:
4 <= 1 < 5
incorrect result!zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ touch Makefile && SIMDLEVEL=SSE2 DEBUG=1 make all
platform=Linux
compiler=g++
***** SIMDLEVEL: SSE2 *****
g++ -c -std=c++0x -I../include -msse2 -g -DDEBUG demo.cpp -o bin/demo.o
g++  bin/demo.o -o bin/demo
rm bin/demo.o
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ ./bin/demo
scalar results:
1 <= 1 < 2
2 <= 2 < 4
4 <= 4 < 5
5 <= 5 < 9
1 <= 1.5 < 2
2 <= 2.5 < 4
4 <= 4.8 < 5
5 <= 8.2 < 9
vectorial results:
1 <= 1 < 2
2 <= 2 < 4
4 <= 4 < 5
5 <= 5 < 9
1 <= 1.5 < 2
2 <= 2.5 < 4
4 <= 4.8 < 5
5 <= 8.2 < 9

zzxx-husky commented 4 years ago

Oh ... I just found that by explicitly setting DEBUG=0 works. With further investigation, I found that the Makefile checks only whether the variable DEBUG is defined or not but does not check the value of DEBUG, so both DEBUG=0 and DEBUG=1 go into DEBUG mode.

zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ touch Makefile && SIMDLEVEL=SSE2 DEBUG=0 make all
platform=Linux
compiler=g++
***** SIMDLEVEL: SSE2 *****
g++ -c -std=c++0x -I../include -msse2 -g -DDEBUG demo.cpp -o bin/demo.o
g++  bin/demo.o -o bin/demo
rm bin/demo.o
zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ ./bin/demo
scalar results:
1 <= 1 < 2
2 <= 2 < 4
4 <= 4 < 5
5 <= 5 < 9
1 <= 1.5 < 2
2 <= 2.5 < 4
4 <= 4.8 < 5
5 <= 8.2 < 9
vectorial results:
1 <= 1 < 2
2 <= 2 < 4
4 <= 4 < 5
5 <= 5 < 9
1 <= 1.5 < 2
2 <= 2.5 < 4
4 <= 4.8 < 5
5 <= 8.2 < 9

But for AVX I met a problem during compilation. That implies that SIMDLEVEL=AVX can compile under non-DEBUG mode but cannot under DEBUG mode.

zzxx@DESKTOP-75K7HNQ:~/repos/FastBinarySearch/source/demo | master
$ touch Makefile && SIMDLEVEL=AVX DEBUG=0 make all
platform=Linux
compiler=g++
***** SIMDLEVEL: AVX *****
g++ -c -std=c++0x -I../include -mavx -g -DDEBUG demo.cpp -o bin/demo.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:43:0,
                 from ../include/SIMD.h:30,
                 from ../include/Algo-ClassicOffset.h:3,
                 from ../include/BinSearch.h:3,
                 from demo.cpp:1:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h: In function 'BinSearch::Details::IVec<(BinSearch::InstrSet)2, float> BinSearch::Details::operator+(const BinSearch::Details::IVec<(BinSearch::InstrSet)2, float>&, const BinSearch::Details::IVec<(BinSearch::InstrSet)2, float>&)':
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx2intrin.h:119:1: error: inlining failed in call to always_inline '__m256i _mm256_add_epi32(__m256i, __m256i)': target specific option mismatch
 _mm256_add_epi32 (__m256i __A, __m256i __B)
 ^~~~~~~~~~~~~~~~
In file included from ../include/Algo-ClassicOffset.h:3:0,
                 from ../include/BinSearch.h:3,
                 from demo.cpp:1:
../include/SIMD.h:456:120: note: called from here
 FORCE_INLINE IVec<AVX,float> operator+  (const IVec<AVX,float>& a, const IVec<AVX,float>& b ) { return _mm256_add_epi32( a, b ); }
                                                                                                        ~~~~~~~~~~~~~~~~^~~~~~~~
Makefile:71: recipe for target 'bin/demo.o' failed
make: *** [bin/demo.o] Error 1

fabiocannizzo commented 4 years ago

I can reproduce the problem. Thanks

fabiocannizzo commented 4 years ago

This is a difficult debug, as any minor change to the source code, introduction of traces, change of compiler or compiler flags make the error disappear. Static analysis tools and valgrind do not detect any issue.

Just analysing the code I identified a possible issue on memory alignment and fixed it. The error disappeared, although this could be just a red herring and I cannot conclude with certainty the problem is actually fixed.

zzxx-husky commented 4 years ago

I just tried the lastest master branch. The problem seems to be solved on my side. Demo now outputs correct answers. Thanks!

fabiocannizzo / FastBinarySearch

Demo produces different results in DEBUG and non-DEBUG mode #1