Closed benwh1 closed 6 years ago
@benwh1 Thanks for reporting. It could be that it has to do with hash collisions, but it is hard to debug this if it is not reproducible in a reasonable amount of time. With move times of a couple of seconds I could not find an issue, so I am not sure how to proceed.
It should also be noted that a few months ago, I ran searches in KOTH, 3 check, horde, racing kings, and regular chess on 20 processors each for about 2 days, with each one reaching around depth 45 or higher. None of them crashed, but atomic segfaults quite regularly past depth 40 or so. However this is the first time I have seen a nonexistent mate being repoted.
Thanks @benwh1 .
I am curious: does an atomic segfault occur if Threads
is 1 (other parameters being anything of your choosing)?
@ddugovic Not sure, haven't tested it before. I just started a process running on 1 thread for 2 days, so I'll let you know if anything happens.
Ok, I found a segmentation fault with 1 thread that occured after only 42 minutes.
setoption name UCI_Variant value atomic
info string variant atomic startpos rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
setoption name Hash value 7600
position startpos moves g1f3 f7f6 e2e3 e7e6 f3d4 c7c6 d4b5 c6b5 d1h5 g7g6 h5b5 b8c6 b5b6 a7b6 f1b5 e8f7 b5d7 a8a2 b2b4
go infinite
...
info depth 38 seldepth 62 multipv 1 score cp -110 lowerbound nodes 3297192656 nps 1310538 hashfull 999 tbhits 0 time 2515906 pv f7e8
info depth 38 seldepth 62 multipv 1 score cp -61 lowerbound nodes 3298654316 nps 1310667 hashfull 999 tbhits 0 time 2516774 pv f7e8
Segmentation fault
Edit: I ran this position again and it crashed in the same place, so this is reproducible,
Strange, I can't seem to duplicate it:
info depth 33 seldepth 47 multipv 1 score cp -244 lowerbound nodes 558913207 nps 1141382 hashfull 735 tbhits 0 time 489681 pv f8d6
info depth 33 seldepth 47 multipv 1 score cp -237 lowerbound nodes 562866623 nps 1141853 hashfull 740 tbhits 0 time 492941 pv f8d6
info depth 33 seldepth 47 multipv 1 score cp -225 lowerbound nodes 564158614 nps 1142044 hashfull 742 tbhits 0 time 493990 pv f8d6
info depth 33 seldepth 53 multipv 1 score cp -252 nodes 609501265 nps 1141398 hashfull 769 tbhits 0 time 533995 pv f8d6 f2f4 f7e8 c1b2 g6g5 h1f1 g5f4 c2c4 h7h5 d2d4 d6g3 e1d1 g3e1 b4b5 g8h6 c4c5 h6g4 c5c6 b7c6 b5b6 e8d7 b6b7 h8b8 d1c2 e6e5 b2a3 e1b4 a3b4 g4f2 g2g4 h5h4 g4g5 f6g5 c2b3 h4h3 b3b4 f2d3 b4b5 d3f2 d4d5 f2d3 d5d6
info depth 34 seldepth 57 multipv 1 score cp -244 lowerbound nodes 631489842 nps 1143961 hashfull 784 tbhits 0 time 552020 pv f8d6
info depth 34 seldepth 57 multipv 1 score cp -237 lowerbound nodes 634069199 nps 1144430 hashfull 787 tbhits 0 time 554048 pv f8d6
info depth 34 seldepth 57 multipv 1 score cp -259 upperbound nodes 778273487 nps 1149262 hashfull 870 tbhits 0 time 677194 pv f8d6 f2f4
info depth 34 seldepth 57 multipv 1 score cp -250 nodes 890901718 nps 1135783 hashfull 909 tbhits 0 time 784394 pv h7h5 c1b2 f7e8 e1e2 g8h6 h2h3 h8h7 h1a1 h7c7 e3e4 c7c2 d2d4 b7b5 g2g4 h5g4 f2f4 e8d8 h3h4 h6f5 e4f5 f6f5 a1c1 d8e8 c1c7 e8d8 e2f2 f8g7 c7c8 d8d7 c8c6 d7d8 f2e2 g7f8 c6c7 f8g7 e2f2 g7f8 c7c8 d8d7
info depth 35 seldepth 44 multipv 1 score cp -257 upperbound nodes 1016571742 nps 1140833 hashfull 941 tbhits 0 time 891078 pv h7h5 c1b2
info depth 35 seldepth 44 multipv 1 score cp -265 upperbound nodes 1222145422 nps 1145685 hashfull 974 tbhits 0 time 1066737 pv h7h5 c1b2
info depth 35 seldepth 47 multipv 1 score cp -257 lowerbound nodes 1313018196 nps 1148867 hashfull 980 tbhits 0 time 1142880 pv h7h5
info depth 35 seldepth 53 multipv 1 score cp -267 nodes 1469566381 nps 1147267 hashfull 987 tbhits 0 time 1280927 pv g8h6 c1b2 f6f5 h2h3 h8g8 f2f4 f8e7 b2f6 e7d6 d2d4 d6c7 e1g1 g6g5 c2c4 g5f4 g2g3 g8a8 d4d5 e6d5 f1d1 c7d6 c4c5 a8a3 f6c3 a3b3 g3g4 f5g4 c5c6 b3b1 e3e4 b1c1 d1c1 d6h2 g1f1 b7c6 b4b5 h2g1 b5b6 g1b6 f1g2 h6f5 e4f5
info depth 36 seldepth 56 multipv 1 score cp -260 lowerbound nodes 1609256417 nps 1149690 hashfull 992 tbhits 0 time 1399730 pv h7h5
info depth 36 seldepth 56 multipv 1 score cp -275 upperbound nodes 1760226972 nps 1146945 hashfull 994 tbhits 0 time 1534708 pv h7h5 c1b2
info depth 36 seldepth 56 multipv 1 score cp -263 lowerbound nodes 1850960564 nps 1149816 hashfull 995 tbhits 0 time 1609788 pv h7h5
info depth 36 seldepth 56 multipv 1 score cp -264 nodes 1882216175 nps 1146215 hashfull 995 tbhits 0 time 1642113 pv h7h5 c1b2 f7e8 e1e2 g8h6 h2h3 h8h7 h1a1 f8d6 c2c4 h7d7 d2d4 h6f5 e3e4 f5e3 f2f4 e3c2 c4c5 d7d8 a1a2 d6b8 b4b5 c2d4 b2d4 e6e5 a2d2 e8f7 c5c6 e5d4 d2d4 f6f5 d4d8 b7c6 e4e5 f7e6 b5b6 g6g5 b6b7 g5f4 e2f3 f5f4 f3e4 f4f3 e4d5
info depth 37 seldepth 59 multipv 1 score cp -271 nodes 2034467576 nps 1124542 hashfull 995 tbhits 0 time 1809151 pv g8h6 c1b2 f6f5 h2h3 h8g8 f2f4 f8e7 b2f6 e7d6 d2d4 d6c7 e1g1 g6g5 c2c4 g5f4 g2g3 g8a8 d4d5 e6d5 f1d1 c7d6 c4c5 a8a3 f6c3 f5f4 c5d6 a3b3 d1d7 f7f8 d7f7 f8e8 c3b2 f4f3 f7e7 e8d8 e7e8 d8d7 e8d8 d7e6 d8d6 e6f7 d6d7 f7e8 d7d8 e8f7 d8f8 f7e7 b2f6 e7d6 f8d8 d6e6 d8d6 e6f7 d6d7 f7e8 d7e7 e8d8 e7e8 d8d7
info depth 38 seldepth 56 multipv 1 score cp -278 upperbound nodes 2514949681 nps 1108091 hashfull 998 tbhits 0 time 2269622 pv g8h6 c1b2
info depth 38 seldepth 56 multipv 1 score cp -286 upperbound nodes 3143460500 nps 1115141 hashfull 999 tbhits 0 time 2818889 pv g8h6 c1b2
info depth 38 seldepth 60 multipv 1 score cp -278 lowerbound nodes 3591459032 nps 1121909 hashfull 999 tbhits 0 time 3201202 pv h7h5
info depth 38 seldepth 60 multipv 1 score cp -280 nodes 3673411017 nps 1122340 hashfull 999 tbhits 0 time 3272993 pv h7h5 c1b2 f7e8 e1e2 g8h6 f2f3 h8h7 h1a1 h7d7 d2d4 d7d8 c2c4 f6f5 a1a7 h6f7 c4c5 f8c5 f3f4 e8f8 b2a3 d8e8 b4b5 f8g8 e3e4 f5e4 e2d2 h5h4 f4f5 g6f5 a3e7 h4h3 g2g4 f7g5 d2d3 g5f3 d4d5 e6d5 d3c2
info depth 39 seldepth 57 multipv 1 score cp -272 lowerbound nodes 4056779528 nps 1125459 hashfull 999 tbhits 0 time 3604555 pv h7h5
info depth 39 seldepth 57 multipv 1 score cp -287 upperbound nodes 4380093074 nps 1127627 hashfull 999 tbhits 0 time 3884345 pv h7h5 c1b2
info depth 39 seldepth 58 multipv 1 score cp -299 upperbound nodes 5509951999 nps 1131862 hashfull 999 tbhits 0 time 4868039 pv h7h5 c1b2
info depth 39 seldepth 64 multipv 1 score cp -287 lowerbound nodes 6116913703 nps 1128292 hashfull 999 tbhits 0 time 5421388 pv h7h5
info depth 39 seldepth 64 multipv 1 score cp -293 nodes 6152202304 nps 1128731 hashfull 999 tbhits 0 time 5450546 pv h7h5 c1b2 f7e8 e1e2 h8h7 h1a1 h7c7 c2c4 b7b5 a1a8 c7c8 f2f4 g8h6 h2h3 b5c4 e3e4 c8a8 b4b5 f8b4 d2d3 b4a5 b5b6 a5b6 g2g4 h5g4 d3d4 f6f5 e4e5 h6f7 d4d5 f7d8 b2c3 d8b7 c3b4 b7d8 b4e7 d8b7 h3h4 e6d5 h4h5 g6h5 e5e6 b7c5
On a later attempt (I forgot I was trying Hash=1
):
info depth 39 seldepth 49 multipv 1 score cp -203 upperbound nodes 7355815419 nps 1224674 hashfull 999 tbhits 0 time 6006344 pv f6f5 c1b2
info depth 39 seldepth 49 multipv 1 score cp -210 upperbound nodes 7572731709 nps 1223619 hashfull 999 tbhits 0 time 6188798 pv f6f5 c1b2
info depth 39 seldepth 50 multipv 1 score cp -222 upperbound nodes 7934360135 nps 1223445 hashfull 999 tbhits 0 time 6485256 pv f6f5 c1b2
info depth 39 seldepth 55 multipv 1 score cp -212 lowerbound nodes 7976014681 nps 1224255 hashfull 999 tbhits 0 time 6514994 pv f6f5
info depth 39 seldepth 55 multipv 1 score cp -190 lowerbound nodes 8122770640 nps 1224802 hashfull 999 tbhits 0 time 6631901 pv f6f5
info depth 39 seldepth 58 multipv 1 score cp -209 nodes 8358476623 nps 1226304 hashfull 999 tbhits 0 time 6815988 pv f6f5 c1b2 g8h6 g2g4 f5g4 f2f4 h8g8 h1f1 h6f5 f1g1 f7e8 b2g7 f8e7 g7f6 e7c5 b4c5 e8d7 e3e4 f5d4 e1f2 g6g5 f6d4 g8a8 g1a1 a8f8 a1a8 g5f4 a8f8 e6e5 d2d4 e5d4 e4e5 d7e6 f2f3 b7b5 f3e4 h7h6 e4d5 h6h5 c2c3 h5h4 d5d6 b5b4 c3b4 h4h3
info depth 40 seldepth 54 multipv 1 score cp -216 upperbound nodes 8888141484 nps 1228393 hashfull 999 tbhits 0 time 7235580 pv f6f5 c1b2
info depth 40 seldepth 54 multipv 1 score cp -224 upperbound nodes 9781659388 nps 1227308 hashfull 999 tbhits 0 time 7970010 pv f6f5 c1b2
info depth 40 seldepth 54 multipv 1 score cp -216 lowerbound nodes 10736108190 nps 1227592 hashfull 999 tbhits 0 time 8745663 pv f6f5
info depth 40 seldepth 54 multipv 1 score cp -235 upperbound nodes 13745883580 nps 1220786 hashfull 999 tbhits 0 time 11259863 pv f6f5 c1b2
info depth 40 seldepth 60 multipv 1 score cp -217 lowerbound nodes 16451029991 nps 1221339 hashfull 999 tbhits 0 time 13469665 pv f6f5
info depth 40 seldepth 60 multipv 1 score cp -224 nodes 16786210178 nps 1221524 hashfull 999 tbhits 0 time 13742016 pv f6f5 c1b2 g8h6 g2g4 h8g8 f2f4 f5g4 h1f1 h6f5 c2c4 f7e8 c4c5 f8e7 b2f6 e7c5 e3e4 f5d4 e1f2 e8d7 f1c1 d4c2 f6e7 b7b5 e4e5 g8a8 f2g3 a8a3 d2d3 d7c8 f4f5 g6f5 c1f1 a3a7 g3h4 c2e1 h4h5 a7c7 d3d4 e1f3 f1c1 c7c2 d4d5
info depth 41 seldepth 48 multipv 1 score cp -217 lowerbound nodes 17771044700 nps 1222951 hashfull 999 tbhits 0 time 14531281 pv f6f5
info depth 41 seldepth 48 multipv 1 score cp -232 upperbound nodes 18271094214 nps 1223608 hashfull 999 tbhits 0 time 14932142 pv f6f5 c1b2
info depth 41 seldepth 53 multipv 1 score cp -220 lowerbound nodes 18322904015 nps 1223983 hashfull 999 tbhits 0 time 14969894 pv f6f5
info depth 41 seldepth 53 multipv 1 score cp -205 lowerbound nodes 18416673075 nps 1223746 hashfull 999 tbhits 0 time 15049424 pv f6f5
info depth 41 seldepth 53 multipv 1 score cp -204 nodes 18823721050 nps 1224077 hashfull 999 tbhits 0 time 15377880 pv f6f5 c1b2 g8h6 g2g4 h8g8 f2f4 f5g4 h1f1 h6f5 c2c4 f8e7 b2f6 e7d6 f1g1 f5g3 h2g3 g6g5 c4c5 h7h5 c5c6 h5h4 d2d4 e6e5 d4d5 e5e4 f6g7 h4h3 c6c7 d6c7 g7f8 f7e8 e1f2 g5f4 g1h1 g8g1 h1h2 e8d7 f8d6 g1g5 b4b5 g5d5 f2g3
info depth 42 seldepth 49 multipv 1 score cp -211 upperbound nodes 20239754483 nps 1225719 hashfull 999 tbhits 0 time 16512552 pv f6f5 c1b2
info depth 42 seldepth 49 multipv 1 score cp -219 upperbound nodes 21493287978 nps 1226035 hashfull 999 tbhits 0 time 17530726 pv f6f5 c1b2
info depth 42 seldepth 49 multipv 1 score cp -211 lowerbound nodes 21690643688 nps 1226460 hashfull 999 tbhits 0 time 17685563 pv f6f5
info depth 42 seldepth 52 multipv 1 score cp -230 upperbound nodes 22800559394 nps 1226778 hashfull 999 tbhits 0 time 18585714 pv f6f5 c1b2
info depth 42 seldepth 52 multipv 1 score cp -212 lowerbound nodes 23071424953 nps 1227835 hashfull 999 tbhits 0 time 18790321 pv f6f5
I am also failing to reproduce. Could you please provide more information about your OS, architecture and perhaps the entire log of the single-threaded search?
Note that I used Stockfish 140617 as indicated in the first post. Is the latest version (or at least release 071017) also crashing for you?
Could it be because of the hash table? You seem to be getting very different evaluations and node counts to me. I'm using 7600MB for the hash table on a HPC cluster running CentOS. I just downloaded the latest version, 211017 and I'm running it under the same configuration now. I'm getting different evaluations with this version so I guess it probably won't crash in the same place.
Sorry, I couldn't find a 140617 download and was unaware the version mattered (but certainly am not ruling it out!).
Thanks a lot @benwh1 for your efforts in finding a position where the segfault is reproducible in a relatively short time.
Using the second position and current master, I found a failed assertion. I only have 8GB of RAM, so to be safe I set the hash size to 4096MB, but as far as I know Stockfish only allocates powers of 2 as the actual hash size, so it should result in the same as when using 7600MB. I found that the same failed assertion can also be reproduced with the default hash size of 16MB, but it takes about twice as many nodes.
The input was:
setoption name UCI_Variant value atomic
setoption name Hash value 4096
position startpos moves g1f3 f7f6 e2e3 e7e6 f3d4 c7c6 d4b5 c6b5 d1h5 g7g6 h5b5 b8c6 b5b6 a7b6 f1b5 e8f7 b5d7 a8a2 b2b4
go infinite
And the last part of the output:
info depth 35 seldepth 44 multipv 1 score cp -257 upperbound nodes 1016571742 nps 757156 hashfull 941 tbhits 0 time 1342617 pv h7h5 c1b2
stockfish: search.cpp:1441: Value {anonymous}::qsearch(Position&, Search::Stack*, Value, Value, Depth) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)0u; bool InCheck = true]: Assertion `InCheck == !!pos.checkers()' failed.
The meaningful part (the rest mainly is dozens of search calls) of the stacktrace is:
#2 0x00007ffff6f7cbd7 in __assert_fail_base (fmt=<optimized out>,
assertion=assertion@entry=0x49c6d5 "InCheck == !!pos.checkers()",
file=file@entry=0x49c64c "search.cpp", line=line@entry=1441,
function=function@entry=0x49cf80 <Value (anonymous namespace)::qsearch<((anonymous namespace)::NodeType)0, true>(Position&, Search::Stack*, Value, Value, Depth)::__PRETTY_FUNCTION__> "Value {anonymous}::qsearch(Position&, Search::Stack*, Value, Value, Depth) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)0u; bool InCheck = true]") at assert.c:92
#3 0x00007ffff6f7cc82 in __GI___assert_fail (
assertion=0x49c6d5 "InCheck == !!pos.checkers()",
file=0x49c64c "search.cpp", line=1441,
function=0x49cf80 <Value (anonymous namespace)::qsearch<((anonymous namespace)::NodeType)0, true>(Position&, Search::Stack*, Value, Value, Depth)::__PRETTY_FUNCTION__> "Value {anonymous}::qsearch(Position&, Search::Stack*, Value, Value, Depth) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)0u; bool InCheck = true]") at assert.c:101
#4 0x0000000000461814 in (anonymous namespace)::qsearch<(<unnamed>::NodeType)0u, true>(Position &, Search::Stack *, Value, Value, Depth) (pos=...,
ss=ss@entry=0x7ffff5f4b9f0, alpha=alpha@entry=-637, beta=beta@entry=-636,
depth=depth@entry=DEPTH_QS_NO_CHECKS) at search.cpp:1441
#5 0x00000000004622cb in (anonymous namespace)::qsearch<(<unnamed>::NodeType)0u, false>(Position &, Search::Stack *, Value, Value, Depth) (pos=..., ss=0x7ffff5f4b9c0, alpha=636, beta=637,
depth=DEPTH_ZERO) at search.cpp:1619
InCheck == !!pos.checkers()
suggests that the bug is either in gives_check
or in the checkersBB
.
The bug has to do with castling from 5b1r/1p5p/4ppp1/4Bn2/1PPP1PP1/4P2P/3k4/4K2R w K - 1 18, which is not check.
Sorry for being indecisive commits listed above; I'm trying to author something both concise and performant.
@ddugovic Nice to see that you already identified the bug. IMO, there is no need to hurry with a fix, take your time to find a good solution. For the sake of completeness, I add the commands to reproduce the issue at depth 1 (when compiling with debug=yes):
setoption name UCI_Variant value atomic
position fen 5b1r/1p5p/4ppp1/4Bn2/1PPP1PP1/4P2P/3k4/4K2R w K - 1 1
go depth 1
stockfish: search.cpp:1441: Value {anonymous}::qsearch(Position&, Search::Stack*, Value, Value, Depth) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)0u; bool InCheck = true]: Assertion `InCheck == !!pos.checkers()' failed.
@ianfab I think I finally have c94606f9025cbfb008b9db5c6c543e27ed6a505f which is readable and performs well. Feel free (you & anyone else) to code review, I'll submit it to STC in case it makes sense to do so...
@ddugovic Unfortunately, I currently do not have the time to review your code or look into why the test failed, but I think there must be a functional difference apart from the fix, because a slowdown far below 1% is unlikely to make the test fail. For now, I scheduled a fixed number of games test with low priority to measure the Elo difference, as the queue anyway was empty.
I've since rebased my changes upon latest master
and further simplified the code (mostly by using attackers_to
), retaining the same bench atomic
: a5d8cc595f0dc42f4f2dc6abdbc589357d3e3878 . It's < 0.5% slower than the parent commit with ~1% less variation in bench times and several branches pruned/combined.
(The diff looks messy due to a code indentation of if (capture(m))
inside the default
block.)
I am running some deep atomic calculations and ran into this in my latest search. This was not the first search after running the program, so I am not sure if it will be reproduced if the search is ran again.