Open lex312 opened 5 years ago
What's the load using one GPU?
There are indeed scaling problems which are addressed by my branch https://github.com/alreadydone/lz/tree/tensor-accum-0.17. I suggest you try it with parameters --batchsize <same as before> --disable-frac-backup
(try adding the parameter --worker 3
no improvement is observed).
Using only one GPU the load is between 74% and 83%.
Can you show me how does it need to look?
I have, using two gpus, actually:
{ "leelaz": { "max-analyze-time-minutes": 60, "analyze-update-interval-centisec": 10, "network-file": "network.gz", "max-game-thinking-time-seconds": 2, "engine-start-location": ".", "engine-command": "./leela-zero/leelaz --gtp --lagbuffer 0 --weights %network-file --gpu 1 --gpu 2", "print-comms": false }, "ui": { "comment-font-size": 0, "board-color": [ 217, 152, 77 ], "shadow-size": 100, "show-winrate": true, "autosave-interval-seconds": -1, "append-winrate-to-comment": true, "fancy-board": true, "show-captured": true, "weighted-blunder-bar-height": false, "--gpu 0 --gpu 1 --gpu 2 --gpu 3": true, "win-rate-always-black": false, "show-move-number": true, "winrate-stroke-width": 3, "show-next-moves": true, "show-comment": true, "show-leelaz-variation": true, "theme": "default", "min-playout-ratio-for-stats": 0, "fancy-stones": true, "resume-previous-game": false, "window-size": [ 3840, 2160 ], "new-move-number-in-branch": true, "shadows-enabled": true, "show-variation-graph": true, "show-dynamic-komi": true, "minimum-blunder-bar-width": 3, "large-winrate": false, "show-blunder-bar": true, "only-last-move-number": 1, "confirm-exit": false, "show-status": true, "handicap-instead-of-winrate": false, "large-subboard": false, "dynamic-winrate-graph-width": true, "show-subboard": true, "window-maximized": true, "show-best-moves": true, "board-size": 19 } }
I have per gpu only 75% to 80% load. This means I have only 150% to 160% effective load and not 200%. This means I lose 40% to 50% of my performance and that's very bad:(
There is no heat problem or enough power problem or something like that. I think that we have compatibility problems with newest gpus and also scaling problems and maybe a bug which cause this problem.