bagaturchess / Bagatur

Java Chess Engine (UCI compatible)
http://bagaturchess.github.io/Bagatur/
Eclipse Public License 2.0
72 stars 18 forks source link

Huge CPU usage spikes at startup #5

Closed tpoppins closed 5 years ago

tpoppins commented 5 years ago

All Bagatur single-core execs I tested, going back to v1.5e, use considerably more CPU than just one core at startup.

At the start of a game for up to a minute Bagatur_64_1_core.exe uses anywhere from two to four cores, according to Task Manager. E.g., on a 20-core system a single-core engine is expected to use 5% of the CPU; Bagatur's usage spikes up to 20%. Later on it settles down but still occasionally jumps to 10% or higher.

Such behavior from an engine purporting to be single-core is unacceptable. Not only it overloads the test system, it also steals CPU time from other engines running concurrently.

Tested under Cute Chess GUI v1.1.0 on three Dual Xeons - a 12-core Westemere-EX, a 16-core Sandy Bridge and a 20-core Ivy Bridge, all running Win 7 x64 SP1. Same behavior under JRE 8 update 191 64-bit and the latest JDK 11.0.1 64-bit.

bagaturchess commented 5 years ago

Hello tpoppins,

Thanks for using Bagatur and for reporting the issue.

I have tried this on my laptop and the CPU load is bigger only a few seconds when the engine starts. During the first search the engine allocates the transposition table and the evaluation caches. I assume that the higher CPU load is because of this caches and it overs when the caches are fully filled with entries.

The Java Virtual Machine is doing work in parallel in system threads, which is out of the control for the java programmer. Examples of such work is memory allocation and garbage collection, which in our case happens exactly at engine startup.

I am afraid that the issue is caused by the Java specifics, although I will keep it open in order to consider it further and investigate the root cause better.

Is it an option for you to configure the tournaments in a way to keep the engine running and not to restarting it for each game?

Best Regards, Krasimir

tpoppins commented 5 years ago

Hi Krasimir,

thank you for the prompt reply.

I'm aware that using Java leaves some things out of the programmer's control. Indeed, the problem I described plagues many old Java engines; yet some of the newer ones, like chess22k and Pirarucu manage to minimize the extent of the problem. I tested quite a few versions of each, all under Cute Chess GUI, and the CPU usage spikes are rather minor (about 1.5-2x of single-core percentage) and brief compared to those of Bagatur.

Since both of the above-mentioned engines are open source perhaps a look at how they do it might help? I must admit that I know very little of C/C++ programming and next to nothing about Java, so I must apologize if this idea is a gross oversimplification.

Not restarting an engine between games is not an option under Cute Chess; IIRC it is under Arena but is not recommended. Besides, Arena doesn't support running multiple games concurrently, so using it on a high-core-count system like those I use would be very slow (just one game at a time) or extremely awkward and error-prone (running 10+ instances of Arena).

I can continue testing Bagatur the way it is but I have to leave at least four cores idle to leave headroom for the CPU usage spikes, compared to just one or two for most of other engines. An attempt to run even 18 games featuring Bagatur on the above-mentioned 20-core E5-2690v2 actually makes Cute Chess GUI crash with an access violation almost 100% of the time; clearly, the GUI is not without its faults, yet it works for some 90% of the engines I test, and I have over 300 of them installed (not counting various versions).

Thank you for your time, Tirsa@CCRL

bagaturchess commented 5 years ago

Hi Tirsa, Thanks for the answer! Pirarucu is writen in Kotlin and I don't have experience with it. Pointing to Chess22k, I see it is quite new, I was not aware that there is such a strong engine writen in Java. The things are evolving. Will have a look ... I have some ideas to test. Could I send you a development version of Bagatur to test with on your multiple CPUs machine? I don't have your email, so please write me a short message.

Best Regards, Krasimir

bagaturchess commented 5 years ago

Hi Tirsa, I am not able to send you the development version, because the mailbox is blocking the attachment. Could you please add the following line in the Bagatur_64_1_core.ini file (assuming you are using Bagatur_64_1_core.exe), just after the line 'splash.image=Bagatur.ico': vmarg.1=-Djava.compiler=NONE

This will disable the JIT compiler. It optimizes the byte code to native code during the runtime, in the beginning of the program execution. For example, according to this article https://aboullaite.me/understanding-jit-compiler-just-in-time-compiler/ "The higher the degree of optimization done by a JIT compiler, the more time it spends in the execution stage."

I think that is the reason, bacause Bagatur has many java classes compared to other java engines. They are simpler from source code perspective than Bagatur.

The bad news are: with this option set, the performance of Bagatur is 30 times slower. So it is not a real option for public release of Bagatur. But Let's try it to know whether this is the case.

Best Regards, Krasimir

tpoppins commented 5 years ago

Krasimir, I can confirm that disabling the JIT compiler the way you said minimized the CPU usage spikes from up to 4x single-core to 1.5-2x. And yes, the NPS dropped by the factor of about 28. Over to you now.

BTW, expect to see v1.6c in this weekend's update. :)

bagaturchess commented 5 years ago

Hi Tirsa, Thanks for your tests! At least now we know why it consumes more CPUs at startup. Unfortunately, I am not able to provide a fix as this will decrease the ELO strength significantly. The issue is because of java specifics (JIT compiler) and the fact that Bagatur has too many java classes to be compiled. So I will mark the issue with "wontfix" flag. I am very sorry that after our cooperation we were not able to find a good solution ... I assume you could close the issue?

Best Regards, Krasimir

tpoppins commented 5 years ago

Sure, no problem, Krasimir; after all it's not critical. Perhaps one day you'll have a flash of inspiration and see a way of minimizing this issue in a way you can't think of today. For now it's usable, if not perfect. And kudos for going your own way!