dje-dev / Ceres

Ceres - an MCTS chess engine for research and recreation
GNU General Public License v3.0
153 stars 23 forks source link

Ceres errors on long time control. System.ArgumentException: Allocation overflow #66

Open therealkingoftheuniverse opened 2 years ago

therealkingoftheuniverse commented 2 years ago

Hi, Tried to run the engine at very long time controls (over 4 hours per side in cutechess) on 3060 mobile (6 gb vram, 16 gb system ram). Got the following after the engine thought for a while (see log.txt). This does not happen every time. But very often.

Do I just need a gpu with more vram to play at these time controls? Maybe get more ram? Use reduced memory mode? Thanks in advance.

Logs: (from engine debug on cutechess) log.txt

dje-dev commented 2 years ago

Yes, error indicates the engine exceeded the maximum number of allowable nodes. Ceres auto-configures this to some reasonable maximum based on the RAM in the computer (it has nothing to do with the GPU memory).

Allocation overflow, requested 70975488 but maximum was set as 70952533

Ceres is somewhat conservative in computing this maximum. You could tell Ceres to allow a somewhat higher number (say, 10% to 30% higher) and see if works ok (or if it causes slowdown in computer due to low memory). To do this, add a line to Ceres.json such as:

  "MaxTreeNodes": 90000000,
therealkingoftheuniverse commented 2 years ago

Thank you, Do does the "70952533" mean the maximum number of nodes allowed? and in the search the engine went more than that which caused the error. This that correct?

Also, if Ceres auto-configures this to some reasonable maximum, won't it just play a move when I reaches that maximum? why would it crash?

dje-dev commented 2 years ago
  1. Correct.

  2. Actually you have a very good point. It is intended that Ceres will just "stop thinking" when it reaches the configured limit, and I have tested that this works in general. However looking at the provided log, your overflow happens during a different phase (rewriting the tree when switching moves). I had not considered that situation, and the failure to recover gracefully is unfortunate. Thank you for pointing this out, I will try to improve this situation.

It is still the case that you can increase the maximum number of nodes to try to reduce the frequency of this happening, but there will be no guarantees until I can address the above issue.