LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.45k stars 530 forks source link

Lc0 gets slower when weight file gets bigger #969

Closed scaljeri closed 5 years ago

scaljeri commented 5 years ago

I have two weight files, one with a size of 6MB and the other 45MB (files is listed here). The problem is that when I start to compute a move, it takes too much time, especially when I use the larger weight file. Below I start lc0 with the larger weight file

$> ./build/release/lc0 --weights=/weights/weight-4305
       _
|   _ | |
|_ |_ |_| v0.23.0-dev+git.9bac230 built Oct 17 2019

and compute a move

position fen rnb1kb1r/pp1p1ppp/4pn2/q7/3NP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 3 6
go  wtime 59000 btime 59000

I get

Loading weights file from: /weights/weight-4305
Creating backend [blas]...
Using Eigen version 3.3.5
BLAS max batch size is 256.
info depth 1 seldepth 1 time 193 nodes 1 score cp 44 hashfull 0 nps 5 
tbhits 0 pv d4b3
bestmove d4b3

This takes approximately 3 seconds. Now, when I do this computation again, it seems to take even more time

position fen rnb1kb1r/pp1p1ppp/4pn2/q7/3NP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 3 6
go  wtime 59000 btime 59000

info depth 2 seldepth 2 time 4957 nodes 3 score cp 44 hashfull 0 nps 0 tbhits 0 pv c1d2 a7a6 bestmove c1d2 ponder a7a6

If I use the smaller weight file it takes about 1 seconds.

In my case I would like to use lc0 to play 1 minute games, so these delays are way to huge. For example, I'm also using Stockfish which doesn't have these issues.

I would like to play 1 minutes games with lc0, but with these slow computation times it is not possible. Is there a way to speed things up?

If you want to see it yourself, I've published lc0 on my DockerHub, so you can try it out yourself

Using the smaller weight file:

$> docker run --rm -it jeanluca/leela-chess-zero:1.0.2

And to run the one using the bigger weight file

$> docker run --rm -it jeanluca/leela-chess-zero:latest

After you have entered one of the above commands you can simply past the positions into the terminal and see the result

Also, to get an idea of how I've build lco, here is my dockerfile

From jeanluca/base:latest

RUN apt update && apt install clang-6.0 ninja-build pkg-config protobuf-compiler libprotobuf-dev meson -y &&\
git clone https://github.com/LeelaChessZero/lc0.git &&\
cd lc0 && ./build.sh -Dblas=true -Deigen=true
COPY ./weights /weights
CMD ["/lc0/build/release/lc0", "--weights=/weights/weight-4305", "--threads=4" ]
cn4750 commented 5 years ago

This is expected behavior. Weights that have more blocks and filters take more time to compute. Running large weight files on CPU is not advised due to how poorly CPUs are at inference compared to GPUs (particularly Nvidia GPUs). If you wish to run larger weight files, you need better hardware and/or need to wait longer.

scaljeri commented 5 years ago

Thanks for the quick reply. Thats to bad for me, my system doesn't support GPU stuff.