LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.47k stars 534 forks source link

How to compile LC0 on Windows 11 ARM, when using a MacBook Pro M1 MAX and Parallels Desktop? #1800

Open Chess321 opened 2 years ago

Chess321 commented 2 years ago

How to compile LC0 on Windows 11 ARM, when using a MacBook Pro M1 MAX and Parallels Desktop?

When we will have the latest dev. version for Windows ARM?

It looks like I need to create a LC0.exe, which can use Apple MacBook ARM cores, to run LC0 inside ChessBase 17 on Windows 11 ARM.

Can someone try to compile on Windows 11 ARM or with Apples terminal?

Maybe this could help you to get an idea: https://github.com/official-stockfish/Stockfish/issues/4241

gsobala commented 2 years ago

Compile native MacOS and just link to it from the VM using ssh / putty / inbetween.exe. It's quicker.

borg323 commented 2 years ago

As far as I know, nobody has tested lc0 on windows arm. It is likely there are some assumptions that windows builds are on x64 (or x86), so code changes may be necessary. If you are still interested in trying, please ask in the #help channel of our discord chat http://lc0.org/chat - I'm certainly interested in getting this done.

Chess321 commented 1 year ago

As far as I know, nobody has tested lc0 on windows arm. It is likely there are some assumptions that windows builds are on x64 (or x86), so code changes may be necessary. If you are still interested in trying, please ask in the #help channel of our discord chat http://lc0.org/chat - I'm certainly interested in getting this done.

@borg323 I tried your lc0.exe, which I saw on discord. Can you please paste it here too for other people?

It's extreme slow: | | | | | |_| v0.30.0-dev+git.dirty built Nov 29 2022 Detected 8 core(s) and 8 thread(s) in 1 group(s). Group 0 has 8 core(s) and 8 thread(s). go nodes 100 Found pb network file: \Mac\Home\Desktop\LC0/d0ed346c32fbcc9eb2f0bc7e957d188c8ae428ee3ef7291fd5aa045fc6ef4ded Creating backend [eigen]... Using Eigen version 3.3.7 Eigen max batch size is 256. info depth 1 seldepth 2 time 40669 nodes 3 score cp 13 nps 0 tbhits 0 pv d2d4 g8f6 info depth 1 seldepth 2 time 42339 nodes 4 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 info depth 1 seldepth 2 time 47458 nodes 4 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 info depth 1 seldepth 2 time 52541 nodes 4 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 info depth 1 seldepth 2 time 57562 nodes 4 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 info depth 2 seldepth 3 time 57894 nodes 7 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 info depth 2 seldepth 3 time 62949 nodes 8 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 info depth 2 seldepth 3 time 68015 nodes 8 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 info depth 2 seldepth 3 time 73049 nodes 8 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 info depth 2 seldepth 4 time 76831 nodes 11 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 2 seldepth 4 time 81897 nodes 17 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 2 seldepth 4 time 86908 nodes 17 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 2 seldepth 4 time 91924 nodes 17 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 2 seldepth 4 time 96949 nodes 17 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 2 seldepth 4 time 102065 nodes 17 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 2 seldepth 4 time 107213 nodes 17 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 info depth 3 seldepth 5 time 107463 nodes 22 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 112530 nodes 22 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 117654 nodes 26 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 122657 nodes 31 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 127658 nodes 31 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 132671 nodes 39 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 137704 nodes 39 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 142766 nodes 39 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 5 time 147782 nodes 39 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 152077 nodes 47 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 157171 nodes 56 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 162267 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 167281 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 172358 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 177373 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 182393 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 187502 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 6 time 192577 nodes 64 score cp 14 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 7 time 195249 nodes 75 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 3 seldepth 7 time 200270 nodes 75 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 info depth 4 seldepth 7 time 204965 nodes 100 score cp 15 nps 0 tbhits 0 pv e2e4 c7c6 g1f3 d7d5 d2d3 e7e6 bestmove e2e4 ponder c7c6

Is eigen using only the CPU? Is it possible to use only 1-2 CPU cores and 32 GPU cores like the native version on macOS in BanksiaGUI is doing?

How to run the benchmark?

It runs fine with a net (782344) in ChessBase 17. But it's extreme slow and it doesn't matter if I select 1 or up to 8 CPU cores, the speed is the same, and it also doesn't matter if the Buddy engine is on or off.

When I open a new board, in most cases it takes 30 to 40 seconds before the first depth and evaluation is available. Sometimes I get depth 3 after that 30 seconds and sometimes I get depth 2 after 60 seconds.

But note that when I run the Buddy engine too, then Buddy reaches very very fast a depth between 9 and 29 depends on what Buddy is searching and showing.

borg323 commented 1 year ago

There is no point posting the binary, it will always be very slow as it is only using the cpu - you can make it a bit faster with correct settings, but this was only meant as a proof of concept and now we know it works. It may be possible to use the gpu with opencl, but I can't move it further than this without access to hardware. Moreover, the opencl backend is not supporting the latest nets, so the solution to your issue is really the one outlined in https://github.com/LeelaChessZero/lc0/issues/1800#issuecomment-1330028992 and detailed in discord.