dshawul / Scorpio

Scorpio chess engine
Other
81 stars 19 forks source link

Segmentation fault in SEARCHER::probe_hash #19

Closed Theowoll closed 2 years ago

Theowoll commented 2 years ago

Compiling Scorpio from source under openSUSE Tumbleweed (without DEFINES += -DHAS_POPCNT and CXXFLAGS += -mavx2 in src/Makefile) results in a segmentation fault as of commit 76515ca (NUMA aware pawn/eval hashtable allocation). Running the binary with ./build.sh && bin/scorpio go quit gives the following output:

feature done=0
scorpio.ini not found!
# rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
# [st = 8185ms, mt = 29220ms , hply = 0 , moves_left 10]
Segmentation fault (core dumped)

After compiling with ./build.sh DEBUG=2 and running the binary with gdb, I get this backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x000000000041299a in SEARCHER::probe_hash (this=0x7ffff783b010, col=1, hash_key=@0x7ffff783b0b8: 2650436591297225499, depth=0, ply=1, eval=@0x7ffff7856f34: -20000, score=@0x7ffff7856f30: 0, move=@0x7ffff7856f24: 0, alpha=-20000, beta=20000, mate_threat=@0x7ffff7856f40: 0, singular=@0x7ffff7856f44: 0, h_depth=@0x7ffff7856f2c: 0, exclusiveP=false) at hash.cpp:153
153             slot = *pslot;
(gdb) bt
#0  0x000000000041299a in SEARCHER::probe_hash (this=0x7ffff783b010, col=1, hash_key=@0x7ffff783b0b8: 2650436591297225499, depth=0, ply=1, eval=@0x7ffff7856f34: -20000, score=@0x7ffff7856f30: 0, move=@0x7ffff7856f24: 0, alpha=-20000, beta=20000, mate_threat=@0x7ffff7856f40: 0, singular=@0x7ffff7856f44: 0, h_depth=@0x7ffff7856f2c: 0, exclusiveP=false) at hash.cpp:153
#1  0x0000000000420091 in SEARCHER::PROBE_HASH (this=0x7ffff783b010, col=1, hash_key=@0x7ffff783b0b8: 2650436591297225499, depth=0, ply=1, eval=@0x7ffff7856f34: -20000, score=@0x7ffff7856f30: 0, move=@0x7ffff7856f24: 0, alpha=-20000, beta=20000, mate_threat=@0x7ffff7856f40: 0, singular=@0x7ffff7856f44: 0, h_depth=@0x7ffff7856f2c: 0, exclusiveP=false) at parallel.cpp:538
#2  0x0000000000427f3c in SEARCHER::hash_cutoff (this=0x7ffff783b010) at search.cpp:66
#3  0x000000000042b84b in SEARCHER::on_qnode_entry (this=0x7ffff783b010) at search.cpp:321
#4  SEARCHER::qsearch (this=0x7ffff783b010) at search.cpp:1175
#5  0x000000000042bd66 in SEARCHER::qsearch_nn (this=0x7ffff783b010) at search.cpp:1221
#6  0x00000000004297c9 in SEARCHER::on_node_entry (this=0x7ffff783b010) at search.cpp:128
#7  search (proc=0x7ffff783b010, single=true) at search.cpp:791
#8  0x000000000042c0c5 in SEARCHER::search_ab (this=0x7ffff783b010) at search.cpp:1310
#9  0x000000000042c43f in SEARCHER::evaluate_moves (this=0x7ffff783b010, depth=0) at search.cpp:1386
#10 0x0000000000435eac in SEARCHER::generate_and_score_moves (this=0x7ffff783b010, alpha=-20000, beta=20000) at mcts.cpp:2030
#11 0x000000000042dd27 in SEARCHER::find_best (this=0x7ffff783b010) at search.cpp:1930
#12 0x000000000040a10f in parse_commands (commands=0x7fffffff8540) at scorpio.cpp:1425
#13 0x00000000004053b4 in main (argc=3, argv=0x7fffffffd658) at scorpio.cpp:225

Valgrind also outputs "Invalid read of size" messages, which first appear after the earlier commit 4a73976 (Support uci hashfull):

$ valgrind bin/scorpio go
==2402== Memcheck, a memory error detector
==2402== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2402== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==2402== Command: bin/scorpio go
==2402== 
feature done=0
scorpio.ini not found!
# rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
# [st = 8185ms, mt = 29220ms , hply = 0 , moves_left 10]
==2402== Invalid read of size 8
==2402==    at 0x41299A: SEARCHER::probe_hash(int, unsigned long const&, int, int, int&, int&, unsigned int&, int, int, int&, int&, int&, bool) (hash.cpp:153)
==2402==    by 0x420090: SEARCHER::PROBE_HASH(int, unsigned long const&, int, int, int&, int&, unsigned int&, int, int, int&, int&, int&, bool) (parallel.cpp:538)
==2402==    by 0x427F3B: SEARCHER::hash_cutoff() (search.cpp:66)
==2402==    by 0x42B84A: on_qnode_entry (search.cpp:321)
==2402==    by 0x42B84A: SEARCHER::qsearch() (search.cpp:1175)
==2402==    by 0x42BD65: SEARCHER::qsearch_nn() (search.cpp:1221)
==2402==    by 0x4297C8: on_node_entry (search.cpp:128)
==2402==    by 0x4297C8: search(PROCESSOR*, bool) (search.cpp:791)
==2402==    by 0x42C0C4: SEARCHER::search_ab() (search.cpp:1310)
==2402==    by 0x42C43E: SEARCHER::evaluate_moves(int) (search.cpp:1386)
==2402==    by 0x435EAB: SEARCHER::generate_and_score_moves(int, int) (mcts.cpp:2030)
==2402==    by 0x42DD26: SEARCHER::find_best() (search.cpp:1930)
==2402==    by 0x40A10E: parse_commands(char**) (scorpio.cpp:1425)
==2402==    by 0x4053B3: main (scorpio.cpp:225)
==2402==  Address 0x1b8 is not stack'd, malloc'd or (recently) free'd
==2402== 
==2402== 
==2402== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2402==  Access not within mapped region at address 0x1B8
==2402==    at 0x41299A: SEARCHER::probe_hash(int, unsigned long const&, int, int, int&, int&, unsigned int&, int, int, int&, int&, int&, bool) (hash.cpp:153)
==2402==    by 0x420090: SEARCHER::PROBE_HASH(int, unsigned long const&, int, int, int&, int&, unsigned int&, int, int, int&, int&, int&, bool) (parallel.cpp:538)
==2402==    by 0x427F3B: SEARCHER::hash_cutoff() (search.cpp:66)
==2402==    by 0x42B84A: on_qnode_entry (search.cpp:321)
==2402==    by 0x42B84A: SEARCHER::qsearch() (search.cpp:1175)
==2402==    by 0x42BD65: SEARCHER::qsearch_nn() (search.cpp:1221)
==2402==    by 0x4297C8: on_node_entry (search.cpp:128)
==2402==    by 0x4297C8: search(PROCESSOR*, bool) (search.cpp:791)
==2402==    by 0x42C0C4: SEARCHER::search_ab() (search.cpp:1310)
==2402==    by 0x42C43E: SEARCHER::evaluate_moves(int) (search.cpp:1386)
==2402==    by 0x435EAB: SEARCHER::generate_and_score_moves(int, int) (mcts.cpp:2030)
==2402==    by 0x42DD26: SEARCHER::find_best() (search.cpp:1930)
==2402==    by 0x40A10E: parse_commands(char**) (scorpio.cpp:1425)
==2402==    by 0x4053B3: main (scorpio.cpp:225)
==2402==  If you believe this happened as a result of a stack
==2402==  overflow in your program's main thread (unlikely but
==2402==  possible), you can try to increase the size of the
==2402==  main thread stack using the --main-stacksize= flag.
==2402==  The main thread stack size used in this run was 8388608.
==2402== 
==2402== HEAP SUMMARY:
==2402==     in use at exit: 6,375,722 bytes in 5 blocks
==2402==   total heap usage: 10 allocs, 5 frees, 6,450,314 bytes allocated
==2402== 
==2402== LEAK SUMMARY:
==2402==    definitely lost: 0 bytes in 0 blocks
==2402==    indirectly lost: 0 bytes in 0 blocks
==2402==      possibly lost: 0 bytes in 0 blocks
==2402==    still reachable: 6,375,722 bytes in 5 blocks
==2402==         suppressed: 0 bytes in 0 blocks
==2402== Rerun with --leak-check=full to see details of leaked memory
==2402== 
==2402== For lists of detected and suppressed errors, rerun with: -s
==2402== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
dshawul commented 2 years ago

Thank you for reporting the issue! I have fixed that problem yesterday. It seems you have additional issue. Somehow it is not finiding scorpio.ini which should be in the same directory as the scorpio binary. I do not generally recommend compiling scorpio from source but installing it following the steps in INSTALL.md. Scorpio has lots of dependencies: egbbdll, libnnprobe, libnnueprobe, libnncpuprobe and many others..

If you have to compile from source and test it first thing i would do is disable many of these options and run it like ./scorpio use_nn 0 use_nnue 0 montecarlo 0 go quit Or run the run_tests.sh script under tests/ directory which I use for github actions. It uses the bare minimum scorpio configuration to run a couple of tests.