WojciechMula / simd-sort

AVX512F and AVX2 versions of quick sort
BSD 2-Clause "Simplified" License
105 stars 13 forks source link

Segment fault when running speed on server with avx512f/avx2 instruction sets #4

Open fengyuleidian0615 opened 7 years ago

fengyuleidian0615 commented 7 years ago

Hi

I run into segment fault when benchmark my server with avx512f support. Here is how I build/run/debug the simd-sort:

  1. Fresh cloned simd-sort, some compile tweaks to build ok. root@00e191a33132:/ws/simd-sort# make clean rm -f test speed test_avx2 speed_avx2 speed_stats speed_avx2_stats root@00e191a33132:/ws/simd-sort# make g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx512f -DHAVE_AVX512F_INSTRUCTIONS -DHAVE_AVX2_INSTRUCTIONS test.cpp -o test

    g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx512f -DHAVE_AVX512F_INSTRUCTIONS -DHAVE_AVX2_INSTRUCTIONS -fsanitize=address test.cpp -o test

    g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx512f -DHAVE_AVX512F_INSTRUCTIONS -DHAVE_AVX2_INSTRUCTIONS -O3 -DNDEBUG speed.cpp -o speed

    g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx2 -DHAVE_AVX2_INSTRUCTIONS -fsanitize=address test.cpp -o test_avx2

    g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx2 -DHAVE_AVX2_INSTRUCTIONS test.cpp -o test_avx2 g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx2 -DHAVE_AVX2_INSTRUCTIONS -O3 -DNDEBUG speed.cpp -o speed_avx2 g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx512f -DHAVE_AVX512F_INSTRUCTIONS -DHAVE_AVX2_INSTRUCTIONS -O3 -DNDEBUG -DWITH_RUNTIME_STATS speed.cpp -o speed_stats g++ -std=c++11 -mbmi2 -Wall -pedantic -Wextra --static -g -mavx2 -DHAVE_AVX2_INSTRUCTIONS -O3 -DNDEBUG -DWITH_RUNTIME_STATS speed.cpp -o speed_avx2_stats

  2. Run the benchmark speed root@00e191a33132:/ws/simd-sort# ./speed 1000000 3 asc rdtsc_overhead set to 58 items count: 1000000 (4000000 bytes), input ascending std::sort ... 23245998 cycles std::qsort ... 49867062 cycles (0.47) std::stable_sort ... 52988608 cycles (0.44) quick sort ... 19142662 cycles (1.21) AVX2 quick sort ... 17550176 cycles (1.32) AVX2 nate nodutch ... 77384680 cycles (0.30) AVX2 alt quicksort ... 14032778 cycles (1.66) AVX2 Nate's variant ... Segmentation fault (core dumped)

  3. debug with GDB, it seems rh exceeds array length [root@purley-2s simd-sort]# gdb ./speed ./core.292269 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /root/dufan/ali/avx512/simd-sort/speed...done. [New LWP 112] Core was generated by `./speed 1000000 3 asc'. Program terminated with signal 11, Segmentation fault.

    0 avx_pivot_on_last_value (length=1000000, array=0x7f2cbe600010) at avx2-nate-quicksort.cpp:340

    340 const int32_t v = array[rh++]; (gdb) bt

    0 avx_pivot_on_last_value (length=1000000, array=0x7f2cbe600010) at avx2-nate-quicksort.cpp:340

    1 nate::avx2_pivotonlast_sort (array=, length=) at avx2-nate-quicksort.cpp:416

    2 0x000000000040cb9f in wrapped_avx2_pivotonlast_sort (right=, left=0, array=) at avx2-nate-quicksort.cpp:433

    3 run<void ()(unsigned int, int, int)> (sort=, this=) at speed.cpp:61

    4 measure<void ()(unsigned int, int, int)> (sort=0x406f30 <nate::wrapped_avx2_pivotonlast_sort(unsigned int*, int, int)>, name=0x54c073 "AVX2 Nate's variant",

    this=0x7ffc4526ea70) at speed.cpp:346

    5 Test::run (this=0x7ffc4526ea70) at speed.cpp:313

    6 0x0000000000400828 in main (argc=4, argv=0x7ffc4526ec68) at speed.cpp:449

    (gdb)

  4. gcc version I used to compile root@00e191a33132:/ws/simd-sort# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1 16.04.4' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

I commented avx2-nat out in the speed.cpp, and the benchmark seems stucks at avx512-quiksort.cpp. Didn't look into it deeper, verified this on two types of avx512 server, both same issue. Any suggestions?

Thanks!

WojciechMula commented 7 years ago

@fengyuleidian0615 Sorry for delay answer. Be aware that the current state of repository is experimental and some algorithms may segfault. Mostly due to wrong partitioning, it degenerates to case where in each call the range [0..n] is split into [0] and [1..n]. Then we're running out of stack when n is big enough.