djChess2019 / position-tester

prepare a set of positions and see how well your chess engine of choice performs.
4 stars 2 forks source link

Exceptions occur more often when running high nodecounts #2

Open MelleKoning opened 4 years ago

MelleKoning commented 4 years ago

Got the tester running after some fiddling with settings.

What I found is that the position tester sometimes fails with the following exception, and on my hardware the exception is more frequent when I increase either the nodecount or the timecontrol.

C:\code\position-tester>python position-tester.py nets.txt postestsettings.json outputsummary.txt outputlog.log

 1000  problems...  C:\lc0\lc0.exe
  nodes:1 seconds
  weight:32930
 {
     "logFile": "C:\\lc0\\logs.txt",
     "threads": 1,
     "minibatchsize": 32,
     "SmartPruningFactor": 1,
     "Threads": 1,
     "VerboseMoveStats": true,
     "HistoryFill": "always",
     "WeightsFile": "C:\\lc0\\nets\\32930"
}
*** if you need to pause all instances runing just create a file named 'pause-Position-tester.txt'

Run 1 of 2: 32930, 1 seconds
Traceback (most recent call last):
  File "position-tester.py", line 281, in runOnePosition
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\threading.py", line 932, in _bootstrap_inner
    for info in analysis:
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 2472, in __next__
    self.run()
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\threading.py", line 870, in run
    return future.result()
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\concurrent\futures\_base.py", line 439, in result
    self._target(*self._args, **self._kwargs)
    return self.__get_result()
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\concurrent\futures\_base.py", line 388, in __get_result
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 183, in background
    raise self._exception
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 2185, in __anext__
    loop.close()
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\asyncio\proactor_events.py", line 679, in close
    return await self.get()
    signal.set_wakeup_fd(-1)
ValueError: set_wakeup_fd only works in main thread
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 2157, in get
    await self._finished
chess.engine.EngineTerminatedError: engine process died unexpectedly (exit code: 3221225620)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "position-tester.py", line 485, in <module>
    main()
  File "position-tester.py", line 465, in main
    agreed, total, nodesUsedList = runOnePositionSet()
  File "position-tester.py", line 401, in runOnePositionSet
    positionResult: LogOutput = runOnePosition(positionLine, engine)
  File "position-tester.py", line 288, in runOnePosition
    agree, nodesUsed = fillAgreeList(board, info, iccf_moves, agreeList)
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 2480, in __exit__
    self.stop()
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 2437, in stop
    with self.simple_engine._not_shut_down():
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "C:\Users\usrname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chess\engine.py", line 2283, in _not_shut_down
    raise EngineTerminatedError("engine event loop dead")
chess.engine.EngineTerminatedError: engine event loop dead

The logs of leela usually show multiple finished test positions while the position-tester could not play catch up. Maybe parsing the amount of uci-info strings is the reason the exception occurs?

MelleKoning commented 4 years ago

I tried changing the verbosemovestats to False on line

params["VerboseMoveStats"] = False # False to prevent errors, True to have a good log.
But it turns out this does not help at all.

The problem seems to be a timing error on slower hardware. When I decrease the amount of nodes from 800 to 100, the results start showing. However when increasing the nodes back up, then the exception is thrown again.

As the lc0 logs themselves do show several finished analysed positions on 800 nodes I suspect there is some time-out issue that is not detected either within the numpy python-chess package or the position-tester python code not able to pick it up (see stack trace above). Have looked up the line that reeds: raise EngineTerminatedError("engine event loop dead") in engine.py as well as the set 'Limit' for the chess-engine but I do not see an issue with setting a limit of Nodes to 800, I do not yet see why setting a limit to 800 causes issues while setting limit to 100 nodes so far does not cause issues. Of course, on my hardware, running 800 nodes takes above 4 seconds, which is long, but still do not have a clue how to prevent the exception.