adamkarvonen / chess_gpt_eval

A repo to evaluate various LLM's chess playing abilities.
68 stars 15 forks source link

Add stockfish benchmark #5

Closed ch3cksout closed 1 year ago

ch3cksout commented 1 year ago

This is very interesting work! To make the results quantitative, we need to know the actual evaluation rate of the stockfish client players used in this test. Unfortunately, "M1 Mac" does not fully specify the actual CPU, nor its speed.

@adamkarvonen, can you run this simple test (in the example given 16 is the assumed number of threads for an 8-core M1 - change this if 'lscpu' shows a different count!): stockfish bench 1024 16 26 default depth nnue 1>/dev/null 2>stockfish_M1Mac.bench

The very last line of the stderr output gives Nodes/second, useful for comparison. (Note that redirecting stdout is important to avoid extra time spent on printing unnecessary output.)

For reference, some older machines running Stockfish 14.1 yielded the following results: Nodes/second CPU / Memory Cores/Threads Extension Member OS 12.556.848 Apple M1 Max @3.23Ghz ddr5 6400 10cores pop-neon Simon Demel 13 12.313.063 Apple M1 Max @3.23Ghz ddr5 6400 10cores pop-neon Simon Demel 12.4 12.177.052 Apple M1 Max @3.2Ghz ddr5 6400 10cores pop-neon oz 12.4

See https://ipmanchess.yolasite.com/amd--intel-chess-bench-stockfish.php

adamkarvonen commented 1 year ago

I ran this command: % stockfish bench 1024 10 26 default depth nnue 1>/dev/null 2>stockfish_M1Mac.bench

Got these results: Total time (ms) : 127745 Nodes searched : 867724059 Nodes/second : 6792626

I also ran your original command: % stockfish bench 1024 16 26 default depth nnue 1>/dev/null 2>stockfish_M1Mac.bench

And got these results: Total time (ms) : 173436 Nodes searched : 1150961645 Nodes/second : 6636232

Here is some hardware info about my machine:

% sysctl -n hw.logicalcpu 10

% sysctl -n hw.physicalcpu 10

ch3cksout commented 1 year ago

Cool thx!
Good to know, as I had thought most recent M1 Macs had 8 cores (16 logical CPUs). This is quite decent benchmark performance, regardless. FYI I am working on an Intel-based set of benchmark runs, I'll let you know what turns up.

adamkarvonen commented 1 year ago

I'm added the stockfish_M1Mac.bench file to logs/ and mentioned it in the README. That should be sufficient to close this issue.