Torom / BotLi

Lichess Bot
GNU Affero General Public License v3.0
46 stars 82 forks source link

Serious issue or bug causing thread exception... #158

Closed King-Juggernaut closed 6 months ago

King-Juggernaut commented 9 months ago

20240205_020648

Greetings! My name is Adam and I love computer chess. I seem to be experiencing a pretty annoying issue when using Botli. When my bot first begins playing through the console everything works fine for about 3 to 4 games and then when the engine is attempting to load for the next match, I get the errors that you can see in the picture above. Interestingly, as soon as I get the error, the engine loads and full CPU usage begins. However, the game online is already aborted and I am then forced to close the console. I have no idea what is going on. I know that I have installed python correctly, packages, and everything required. But this bug is really unfortunate because it is preventing me from being able to continue running matches for more than a few minutes.

For your information, I am running the latest version of stockfish development. Similarly, I have 80 threads (40 cores dual socket intel 6148 CPUs) and run the hash at 16 gigabytes. Also I am using Windows 11 Pro.

I do wonder if this issue is somehow connected to such a high core count?

The issue also seems to be the constant loading and unloading of the engine as opposed to it just staying loaded in the ram. I have had this same error three times. Each time that it occurred, the thread that the error took place in was different.

Do you have any ideas as to what is going on? Something must be causing this...

I hope to hear from you.

Thank-you.

Torom commented 9 months ago

The problem seems to lie in the communication with the engine. The error comes from python-chess, the library that BotLi uses to communicate with the engine. Normally this is very stable, there are some bots that run for weeks without problems. However, the large number of threads and the use of multi CPU systems is probably rather untested in BotLi as not many have such hardware.

I notice that you are not using the latest version of BotLi, the first step would be to update it. As a workaround (we should rather fix the problem), you can switch off pondering, I think the error occurs due to manual pondering of BotLi.

King-Juggernaut commented 9 months ago

Greetings,

Thank you so much for your prompt response to my bug report—I greatly appreciate it. The information you've shared is quite enlightening; however, I do have a few questions.

Firstly, I'm curious as to why the engine must load and unload after each game. In GUIs such as Winboard or Chessbase, the engine loads once and remains in RAM until one opts to exit. On running Stockfish via the console and entering the game, the performance is stellar. It appears that the error occurs when the engine is being loaded into RAM. Could we possibly devise a method to keep the engine loaded in the RAM during matchmaking or while awaiting challenges? I also have concerns that the frequent loading and unloading might be detrimental when large pages are enabled. Nonetheless, it's possible that this loading behavior is just an observation and not the actual cause of the problem.

Secondly, I'm keen to understand your reasoning behind pondering being the culprit.

I am planning to update BotLi—thank you for bringing that to my attention. Additionally, I'm considering running the engine without hyperthreading to decrease the thread count from 80 to 40 actual cores. Do you think this might alleviate the issue?

I am somewhat reluctant to disable pondering during blitz games, as I presume it could negatively impact the ELO rating.

It's also worth noting that I encountered errors when loading the engine with 64 GB of hash. BotLi would time out, prompting multiple attempts to load. At times, it would load with 64 GB of RAM, yet it didn't seem stable. These errors surfaced during the preliminary self-test, before handling challenges and issuing matchmaking challenges. When I reduced the hash to 16 GB, the errors ceased, and the initial self-test completed swiftly, leading me to believe the issue had been resolved—but clearly, that wasn't the case. Haha.

Lastly, might you have any suggestions for a possible fix?

Thank you immensely for your time.

Best regards,

Adam

Torom commented 9 months ago

Firstly, I'm curious as to why the engine must load and unload after each game. In GUIs such as Winboard or Chessbase, the engine loads once and remains in RAM until one opts to exit. On running Stockfish via the console and entering the game, the performance is stellar. It appears that the error occurs when the engine is being loaded into RAM. Could we possibly devise a method to keep the engine loaded in the RAM during matchmaking or while awaiting challenges? I also have concerns that the frequent loading and unloading might be detrimental when large pages are enabled. Nonetheless, it's possible that this loading behavior is just an observation and not the actual cause of the problem.

This is not possible because BotLi supports an arbitrary number of engines at the same time. The most commonly used example here is one engine for standard chess and 960 and another engine for the Lichess variants. However, you could also configure a separate engine for each variant and each colour. BotLi would therefore have to run any number of engines simultaneously, which is simply not possible or sensible.

In addition, no negative effect is to be expected from the constant starting and stopping, which is the usual procedure in engine vs. engine tournaments and also in Fishtest.

Secondly, I'm keen to understand your reasoning behind pondering being the culprit.

I'm not sure about that, but in the one case you posted, the error occurred when starting the manual pondering.

I am planning to update BotLi—thank you for bringing that to my attention. Additionally, I'm considering running the engine without hyperthreading to decrease the thread count from 80 to 40 actual cores. Do you think this might alleviate the issue?

Yes, that is quite possible. Perhaps the error is really due to the fact that the system is completely loaded and then also with NUMA cores. You are also using Windows, which is less tested (I use Linux).

It's also worth noting that I encountered errors when loading the engine with 64 GB of hash. BotLi would time out, prompting multiple attempts to load. At times, it would load with 64 GB of RAM, yet it didn't seem stable. These errors surfaced during the preliminary self-test, before handling challenges and issuing matchmaking challenges. When I reduced the hash to 16 GB, the errors ceased, and the initial self-test completed swiftly, leading me to believe the issue had been resolved—but clearly, that wasn't the case. Haha.

Hmm, you are sure that Stockfish works reliably without BotLi? Even with 80 Threads and 64 GB Hash? Over a longer period of time? In most cases, the fault with such problems actually lay outside BotLi. I have no experience of how Windows works with two CPUs, but it would be important to know that the engine works without BotLi.

Lastly, might you have any suggestions for a possible fix?

We will have to tackle this bit by bit. Firstly, update BotLi to the latest version and perhaps make sure that the engine works with the same settings without any problems without BotLi.

Thank you immensely for your time.

No problem, thank you for investing the time to report the problem and maybe even solve it.

EmptikBest commented 9 months ago

Greetings,

Thank you so much for your prompt response to my bug report—I greatly appreciate it. The information you've shared is quite enlightening; however, I do have a few questions.

Firstly, I'm curious as to why the engine must load and unload after each game. In GUIs such as Winboard or Chessbase, the engine loads once and remains in RAM until one opts to exit. On running Stockfish via the console and entering the game, the performance is stellar. It appears that the error occurs when the engine is being loaded into RAM. Could we possibly devise a method to keep the engine loaded in the RAM during matchmaking or while awaiting challenges? I also have concerns that the frequent loading and unloading might be detrimental when large pages are enabled. Nonetheless, it's possible that this loading behavior is just an observation and not the actual cause of the problem.

Secondly, I'm keen to understand your reasoning behind pondering being the culprit.

I am planning to update BotLi—thank you for bringing that to my attention. Additionally, I'm considering running the engine without hyperthreading to decrease the thread count from 80 to 40 actual cores. Do you think this might alleviate the issue?

I am somewhat reluctant to disable pondering during blitz games, as I presume it could negatively impact the ELO rating.

It's also worth noting that I encountered errors when loading the engine with 64 GB of hash. BotLi would time out, prompting multiple attempts to load. At times, it would load with 64 GB of RAM, yet it didn't seem stable. These errors surfaced during the preliminary self-test, before handling challenges and issuing matchmaking challenges. When I reduced the hash to 16 GB, the errors ceased, and the initial self-test completed swiftly, leading me to believe the issue had been resolved—but clearly, that wasn't the case. Haha.

Lastly, might you have any suggestions for a possible fix?

Thank you immensely for your time.

Best regards,

Adam

Hi,

It might solve your problem if you leave 2-4 threads for the OS... For me on a 5950X with 32 threads, my system used to lag immensely when I had BotLi configured to 32 Threads and 32GB Hash (I have 32GB RAM total).. You should also try leaving around 5-10GB of RAM for the OS..

Additionaly, after a certain point, more Hash actually slows down the engine, so for Blitz in your case 12-16GB should be the sweet spot..

Torom commented 6 months ago

I am closing the issue as there is no further information and it does not seem to happen frequently.