AndyGrant / OpenBench

OpenBench is a Distributed SPRT Testing Framework for Chess Engines
GNU General Public License v3.0
149 stars 141 forks source link

Client discards stderr on engine crash #201

Open Algorhythm-sxv opened 7 months ago

Algorhythm-sxv commented 7 months ago

I currently have an issue with my engine that causes rare crashes in 50-move draws (~1/10000 games). OB reports the crash and the PGN of the offending game, but it completely discards the stderr output of the crashed engine, which is a very important diagnostic tool that can significantly speed up the tracking down of bugs.

Currently the only option is to ensure your engine is perfectly accurate in both the reporting of the node count and the playing of the node count with go nodes, and even then a separate script must be made that calls go nodes repeatedly and leaves the engine in the state where the crash occurred.

A lot of hassle in this regard could be avoided by saving stderr with cutechess-clis built-in stderr option and showing that in the Errors view of the dashboard.

AndyGrant commented 7 months ago

So there are some technical limitations to this, in that I don't see an immediate way to tie together a PGN with any particular stderr output. So at best I would be uploading the entire contents of stderr.

What exactly is the issue with the go nodes approach, other than the fact that you don't have such a script readily made? I can't see where having a reproducible crash can not be better than just having stderr which may or may not have anything useful in it.

Algorhythm-sxv commented 7 months ago

My initial thought was indeed to include the entire stderr output, since that's where any language-specific crash errors will be sent. However if pumping GBs of data over stderr and messing with the server is of concern then a more limited amount of stderr could be saved. cutechess-cli streams data into the file as it comes, so the data can then be manipulated with normal file system utilities/code.

For me currently, the primary issue with the go nodes approach is that the requirement for go nodes and reported nodes per move to align exactly is a quite a strong one, with some subtleties around what counts as a node, abort/timeout reports and off-by-1 errors. I have not invested effort into this previously so my engine will certainly not be able to use this approach as-is, and I'm sure I'm not alone in that regard.

My engine is written in Rust, which like many other languages outputs detailed messages over stderr when a panic/exception occurs. This alone has been enough to solve most crashes I have had, only rarely needing to resort to the debugger.

Of course perfect reproducibility is more powerful than a potentially unhelpful error message, but there is nothing to be lost from the programmer's perspective from having both. Including stderr as a PGN comment could even have everything in one file.

If implementation effort is of a concern I can try it myself and make a PR, but my general lack of Python/Django experience can't promise a high quality of implementation.

AndyGrant commented 7 months ago

I'm mildly concerned about engines printing a ton of information to stderr. But I don't know if that is the case. And frankly it would probably be an issue if it were.

What I can do is launch cutechess, and redirect stderr to cutechess.stderr.dev.log and cutechess.stderr.base.log. Currently, when a workload finishes, the PGN is parsed. If any crashes, time losses, stalls, disconnects, or illegal moves appear in the PGN, the game is pulled out, and uploaded to the server as an error. I can attach the entire stderr log to that as well.

This will ultimately lead to additional information. IE if you managed 10 crashes on a single workload, you'll be getting 10 crashes worth of stderr in every pgn of every error report made to the server. But presumably errors are rare enough, that this is not a concern.

AndyGrant commented 7 months ago

I would be curious though to see the 1 in 10000 PGN you have, and the branch version of your engine that it came from. I would be surprised if I could not reproduce it, even if I can't type "go nodes".

Algorhythm-sxv commented 7 months ago

In my case the crashes are rare enough that I can't reproduce them locally, even running 20000 games so far. In my experience crashes have either been extremely rare or extremely common. The common ones are of course easy to reproduce locally but this particular one has evaded me.

AndyGrant commented 7 months ago

Well I mean I would like to try to reproduce the game, directly from the PGN, if you have it. Will also need the branch name, network ( if any ), and the engine options like hash/thread.

Algorhythm-sxv commented 7 months ago

Here is the PGN of the most recent crash:

PGN ``` [Event "?"] [Site "?"] [Date "2024.03.11"] [Round "1"] [White "Cheers-dev"] [Black "Cheers-base"] [Result "0-1"] [FEN "r2qkbnr/pppbpp1p/2np2p1/8/8/P1P2N2/1PQPPPPP/RNB1KB1R w KQkq - 0 1"] [GameDuration "00:00:27"] [GameEndTime "2024-03-11T11:59:13.796 UTC"] [GameStartTime "2024-03-11T11:58:46.370 UTC"] [PlyCount "152"] [SetUp "1"] [Termination "stalled connection"] [TimeControl "3.49+0.03"] 1. d4 {+0.09 17/32 272 690731} d5 {-0.09 17/28 202 510243} 2. h3 {+0.15 18/34 570 1415291} Nf6 {-0.04 17/24 187 473102} 3. Bf4 {+0.12 18/38 287 743406} Bf5 {-0.20 16/30 232 573786} 4. Qd1 {+0.16 17/29 142 360227} Bg7 {-0.13 14/33 220 553310} 5. Nbd2 {+0.14 16/30 212 515133} O-O {-0.11 16/33 287 713526} 6. g4 {+0.14 14/35 234 575106} Bc8 {+0.01 17/33 272 685057} 7. e3 {+0.09 13/29 182 440727} h5 {-0.04 16/25 142 365429} 8. gxh5 {0.00 16/29 128 330038} Nxh5 {-0.01 17/28 184 423866} 9. Bh2 {-0.04 14/30 122 259618} Re8 {-0.04 14/29 166 393446} 10. Ne5 {-0.01 14/26 114 286344} a5 {-0.05 15/33 316 798777} 11. Nxc6 {0.00 14/27 151 348090} bxc6 {-0.05 18/29 99 257483} 12. Be2 {+0.04 14/24 91 224685} Nf6 {-0.08 18/32 144 400933} 13. h4 {+0.10 17/30 163 441843} Ba6 {-0.15 17/32 293 777302} 14. Nf3 {+0.11 15/27 108 287685} Qd7 {-0.01 13/24 85 230433} 15. Ne5 {+0.12 16/25 106 281053} Qc8 {-0.16 13/24 97 257387} 16. Bxa6 {+0.22 16/25 134 361565} Qxa6 {-0.20 14/25 108 280959} 17. Qe2 {+0.22 15/27 78 221916} Ne4 {-0.21 14/22 94 228708} 18. h5 {+0.58 14/21 82 251579} g5 {-0.46 13/21 62 188068} 19. Qxa6 {+0.65 16/22 62 193292} Rxa6 {-0.73 14/25 73 231098} 20. Nd3 {+0.68 14/21 77 184097} Kh7 {-0.65 14/22 82 262560} 21. Rg1 {+0.75 15/23 56 173674} Rc8 {-0.55 16/22 63 199232} 22. f3 {+0.47 17/28 141 447889} Nd6 {-0.52 15/24 80 223200} 23. f4 {+0.49 13/24 79 211734} gxf4 {-0.45 16/22 61 191177} 24. Bxf4 {+0.43 16/23 49 159020} Bh6 {-0.40 17/27 111 373983} 25. Bxd6 {+0.39 16/21 55 181407} cxd6 {-0.19 15/27 59 200118} 26. Ke2 {+0.41 15/22 63 203057} c5 {-0.18 15/25 60 204492} 27. Raf1 {+0.21 14/22 68 216194} f6 {-0.23 16/22 261 147997} 28. Rb1 {+0.24 13/21 42 130507} cxd4 {+0.01 14/24 63 210031} 29. exd4 {-0.02 14/23 96 288911} Rb6 {+0.04 12/23 32 96465} 30. Rgf1 {-0.02 14/21 42 135887} Rg8 {+0.02 12/22 37 104377} 31. Rf5 {0.00 13/25 41 127521} Rg3 {-0.08 14/26 63 205869} 32. Rf3 {-0.02 11/21 39 119947} Rg2+ {-0.01 15/17 108 27649} 33. Rf2 {-0.02 14/21 39 127706} Rg3 {+0.03 17/21 69 59415} 34. Rf3 {+0.02 16/26 58 173487} Rg5 {+0.25 12/19 19 60231} 35. Rh3 {-0.22 12/23 53 168109} e5 {+0.34 12/20 29 85867} 36. Nf2 {-0.26 13/24 63 195267} Rb8 {+0.28 11/21 42 71620} 37. b4 {-0.08 12/22 64 178171} exd4 {+0.30 11/18 50 56103} 38. cxd4 {-0.03 13/22 32 101153} Rf5 {-0.01 11/20 21 63672} 39. Ng4 {0.00 14/24 54 168759} Rf4 {-0.13 12/20 22 73396} 40. Nxh6 {+0.25 15/22 37 115297} Kxh6 {-0.21 15/21 30 85372} 41. b5 {+0.21 14/23 31 109222} Rb6 {-0.30 14/21 48 39262} 42. Ke3 {+0.46 16/25 30 105499} Rf5 {-0.28 15/24 27 84565} 43. a4 {+0.71 16/25 40 116533} Rg5 {-0.74 14/22 38 118957} 44. Rf1 {+0.82 17/25 52 111501} f5 {-0.82 14/22 37 66597} 45. Rhf3 {+0.79 14/24 35 134315} Kxh5 {-0.89 13/22 25 96372} 46. Rxf5 {+0.90 15/27 37 137496} Rb7 {-0.95 13/20 19 79691} 47. Kd2 {+0.85 14/25 35 87087} Rb8 {-0.92 14/24 33 136908} 48. Rxg5+ {+1.24 18/26 28 127033} Kxg5 {-1.10 15/25 22 103044} 49. Rf7 {+1.18 18/31 36 175290} Rh8 {-1.18 16/24 23 100161} 50. b6 {+1.11 18/34 75 358263} Rh2+ {-1.09 16/29 30 154720} 51. Kc1 {+1.14 19/27 27 137234} Rh1+ {-1.22 18/30 31 82889} 52. Kc2 {+1.16 18/28 70 82794} Rh2+ {-1.21 18/33 41 188745} 53. Kc1 {+1.14 19/31 34 182466} Rh1+ {-1.16 17/29 26 130647} 54. Kd2 {+1.10 18/28 22 116732} Rh2+ {-1.23 18/31 43 79909} 55. Ke3 {+1.10 18/34 25 125714} Rh3+ {-1.09 16/30 19 88040} 56. Ke2 {+1.08 19/25 32 147360} Rh2+ {-1.15 16/25 25 115704} 57. Ke3 {+1.10 19/26 24 91582} Rh3+ {-1.15 20/23 20 100597} 58. Rf3 {+1.09 20/25 25 99350} Rh7 {-0.79 21/27 46 191050} 59. Kd2 {+0.95 18/25 49 221820} Rb7 {-0.79 20/26 23 106447} 60. Rb3 {+0.95 18/25 20 89663} Kf5 {-0.75 19/27 34 124547} 61. Kc3 {+0.75 19/27 36 168106} Ke6 {-0.95 20/31 30 122848} 62. Rb5 {+0.78 19/27 32 139899} Kd7 {-0.89 21/30 27 120170} 63. Rxd5 {+0.89 15/23 31 131712} Rxb6 {-0.87 20/23 30 127733} 64. Rxa5 {+0.86 17/22 27 94479} Rb1 {-0.97 19/25 36 142223} 65. Rb5 {+0.93 17/23 29 112429} Ra1 {-0.84 19/25 26 118894} 66. Kb4 {+0.94 17/27 43 167858} Kc6 {-0.83 21/28 27 139908} 67. d5+ {+0.94 21/30 32 143593} Kc7 {-0.79 20/30 22 98470} 68. a5 {+0.93 19/23 20 80500} Ra2 {-0.79 23/29 22 80328} 69. Kb3 {+0.86 21/31 30 132997} Ra1 {-0.78 23/30 34 108397} 70. Kb4 {+0.86 21/29 27 86915} Ra2 {-0.82 23/33 25 115490} 71. Kc3 {+0.87 21/29 35 132744} Ra3+ {-0.77 24/33 42 193628} 72. Kd4 {+0.87 22/36 31 172247} Ra4+ {-0.77 25/33 48 77470} 73. Kd3 {+0.86 21/27 30 118424} Kd8 {-0.77 24/27 33 93323} 74. Kc2 {+0.76 23/31 40 163532} Kc7 {-0.77 26/35 61 114408} 75. Kb2 {+0.74 21/28 30 123954} Kd8 {-0.77 27/30 40 88928} 76. Kb3 {+0.73 20/27 26 115093} Ra1 {-0.77 27/29 69 91733, White's connection stalls} 0-1 ```

This is from this test running this commit

With previous errors I have tried using pgn-extract to set the engine up with the final position, but didn't see any errors searching there.

AndyGrant commented 7 months ago

Based on that pgn, its nmp_tweaks ee672c3944c738ab8686c5d3f71945a96f96083c that had the error.

Algorhythm-sxv commented 7 months ago

Yes you're right, I copied the wrong hash

AndyGrant commented 7 months ago

Managed to reproduce the first 53 moves. After that the lack of fixed-node play is rough. After adding all the needed stops to get actual correct output, I stills struggled because some N nodes is actually needed instead of x, to account for excess counting and the process of unwinding your search.

So while I could reproduce a future one, this current one I'm not going to invest. But you can brute force your way to the answer if you really cared.

Algorhythm-sxv commented 7 months ago

I appreciate the effort to try. Correcting fixed-node play is now definitely something I will invest time in, but that discussion is separate to this issue about stderr.

Algorhythm-sxv commented 6 months ago

I have now updated my engine to accurately play to node counts, and caught another crash on this test.

using a python-chess script I can perfectly recreate the game from the PGN using node counts, but get no crash (on my system) once I search the final position.

I am fairly confident that my implementation of accurate node counting lines up between time-based and node-based aborts, as time aborts are checked when nodes % 2048 == 2047 and go movetime always results in a node count that is one less than a multiple of 2048.

I am completely at a loss as to what might be causing this crash, and access to the stderr output would help immensely.

Here is the python script that I am using to recreate the game:

import chess
import chess.engine
import chess.pgn

import logging
logging.basicConfig(level=logging.DEBUG)

pgn = open("error_base.pgn")
game = chess.pgn.read_game(pgn)
dev = chess.engine.SimpleEngine.popen_uci("dev.exe", debug=True)
dev.configure({"Threads": 1, "Hash": 8})
base = chess.engine.SimpleEngine.popen_uci("base.exe", debug=True)
base.configure({"Threads": 1, "Hash": 8})

board = game.board()
for node in game.mainline():
    comment = node.comment
    node_count = int(comment.split(' ')[3].rstrip(','))
    color = not node.turn()
    engine_to_play = dev
    if color == chess.WHITE:
        engine_to_play = base
    result = engine_to_play.play(board, chess.engine.Limit(nodes=node_count))
    print(node_count, node.move, result.move)
    assert(result.move == node.move)
    board.push(result.move)

result = base.play(board, chess.engine.Limit(time=0.01))

print(result)

base.quit()
dev.quit()