glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
758 stars 298 forks source link

Multi-GPU training? #208

Closed davidssmith closed 6 years ago

davidssmith commented 6 years ago

Is there a way to use multiple GPUs in one system?

Should I start a second instance of the client, or will that lead to race conditions with some temp files?

Uriopass commented 6 years ago

Nope you can run multiple instances of the client, won't lead to race conditions ! I suggest you join the discord or use the forum for quick questions like this.

jjoshua2 commented 6 years ago

For example with i6700 and fast discrete card you should run a 4 clients to max out with each arg --gpu 0 --gpu 0 --gpu 1 --gpu 2. Two with 0 to fully utilize the discrete card, and 1 for igpu, and 1 for intel's opencl implementation. For me 1 client only uses 50% of my discrete card doesn't even get it to full clockspeed.

davidssmith commented 6 years ago

So the device numbers are not the typical 0 and 1 for the two discrete cards?

I have two discrete cards, would that be --gpu 0 --gpu 0 --gpu 1 --gpu 1 --gpu 2 --gpu 3?

jjoshua2 commented 6 years ago

if you run lczero.exe -w networks/WEIGHT_HERE you will see it print out all the opencl devices it found and their device number. Probably 0 and 1 will be the discrete ones, and if you have intel it will see more, if you have amd it won't.

davidssmith commented 6 years ago

It appears to work with GNU parallel to run the multiple instances, e.g.: parallel ./client_linux -user myuser -password mypass -gpu {} ::: 0 0 1 1

Error323 commented 6 years ago

GNU parallel is awesome.