Zeta36 / chess-alpha-zero

Chess reinforcement learning by AlphaGo Zero methods.
MIT License
2.13k stars 480 forks source link

Running Alpha Zero #55

Closed philipstephens closed 6 years ago

philipstephens commented 6 years ago

Whenever I run AlphaZero chess for the second time after reinstalling python I get the error

File "h5py\h5f.pyx", line 78, in h5py.h5f.open OSError: Unable to open file (File signature not found)

How do I avoid getting that error. When I run the command src\ chess_zero\run.py self --distributed and want to stop execution I type ctrl-c. How else do I stop the command without getting the above error? Thanks.

Philip

philipstephens commented 6 years ago

Here's the error I get when I run it from Arena:

2018-02-13 17:38:02.150 Arena 3.5 2018-02-13 17:38:02.157 2018-02-13 17:38:02.373----------New game---2018-02-13 17:38:02,373 Tue ------------- 2018-02-13 17:38:02.415screen: 1920x1080 2018-02-13 17:38:02.415Monitors: 1 2018-02-13 17:38:02.415Monitor0: 1920x1080 2018-02-13 17:38:02.415FormMonitor: 0 2018-02-13 17:38:02.471Loading 1 2018-02-13 17:39:36.7881--------------------------Starting engine 1 C0uci--------------------------- 2018-02-13 17:39:36.7891Configured Engine 1 Type: Auto 2018-02-13 17:39:36.7891Engine 1 dir: E:\programming\pyproj\chess\chess-alpha-zero 2018-02-13 17:39:36.7891Engine 1 commandline: E:\programming\pyproj\chess\chess-alpha-zero\C0uci.bat 2018-02-13 17:39:36.9001Child Process Prio Adj: PID 10000 conhost.exe 2018-02-13 17:39:36.9011Child Process Prio Adj: PID 8604 python.exe 2018-02-13 17:39:36.9011Engine 1 ProcessID: 3620 2018-02-13 17:39:36.9011Engine 1 Prio:32 ThreadPrio:0 2018-02-13 17:39:36.901-->1:xboard 2018-02-13 17:39:36.922<--1:E:\programming\pyproj\chess\chess-alpha-zero>python src/chess_zero/run.py uci 2018-02-13 17:39:36.922-->1:uci 2018-02-13 17:39:37.136<--1:id name ChessZero 2018-02-13 17:39:37.136<--1:id author ChessZero 2018-02-13 17:39:37.136<--1:uciok 2018-02-13 17:39:37.1421Child Process Prio Adj: PID 10000 conhost.exe 2018-02-13 17:39:37.1421Child Process Prio Adj: PID 8604 python.exe 2018-02-13 17:39:37.142-->1:isready 2018-02-13 17:39:42.753<--1:Using TensorFlow backend. 2018-02-13 17:39:42.756<--1:Traceback (most recent call last): 2018-02-13 17:39:42.756<--1: File "src/chess_zero/run.py", line 20, in 2018-02-13 17:39:42.756<--1: manager.start() 2018-02-13 17:39:42.756<--1: File "src\chess_zero\manager.py", line 76, in start 2018-02-13 17:39:42.756<--1: return uci.start(config) 2018-02-13 17:39:42.756<--1: File "src\chess_zero\play_game\uci.py", line 31, in start 2018-02-13 17:39:42.756<--1: me_player = get_player(config) 2018-02-13 17:39:42.756<--1: File "src\chess_zero\play_game\uci.py", line 67, in get_player 2018-02-13 17:39:42.756<--1: if not load_best_model_weight(model): 2018-02-13 17:39:42.756<--1: File "src\chess_zero\lib\model_helper.py", line 15, in load_best_model_weight 2018-02-13 17:39:42.756<--1: return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path) 2018-02-13 17:39:42.756<--1: File "src\chess_zero\agent\model_chess.py", line 146, in load 2018-02-13 17:39:42.756<--1: self.model.load_weights(weight_path) 2018-02-13 17:39:42.756<--1: File "C:\Users\User\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2638, in load_weights 2018-02-13 17:39:42.756<--1: f = h5py.File(filepath, mode='r') 2018-02-13 17:39:42.756<--1: File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 271, in init 2018-02-13 17:39:42.756<--1: fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) 2018-02-13 17:39:42.756<--1: File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 101, in make_fid 2018-02-13 17:39:42.756<--1: fid = h5f.open(name, flags, fapl=fapl) 2018-02-13 17:39:42.756<--1: File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper 2018-02-13 17:39:42.756<--1: File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper 2018-02-13 17:39:42.756<--1: File "h5py\h5f.pyx", line 78, in h5py.h5f.open 2018-02-13 17:39:42.756<--1:OSError: Unable to open file (File signature not found) 2018-02-13 17:39:43.204<--1:2018-02-13 17:39:39.489314: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-02-13 17:39:43.204<--1:2018-02-13 17:39:39.999243: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105] Found device 0 with properties: 2018-02-13 17:39:43.204<--1:name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 2018-02-13 17:39:43.204<--1:pciBusID: 0000:22:00.0 2018-02-13 17:39:43.204<--1:totalMemory: 4.00GiB freeMemory: 3.30GiB 2018-02-13 17:39:43.204<--1:2018-02-13 17:39:39.999281: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:22:00.0, compute capability: 6.1) 2018-02-13 17:39:43.6552--------------------------Starting engine 2 C0uci--------------------------- 2018-02-13 17:39:43.6562Configured Engine 2 Type: Auto 2018-02-13 17:39:43.6562Engine 2 dir: E:\programming\pyproj\chess\chess-alpha-zero 2018-02-13 17:39:43.6562Engine 2 commandline: E:\programming\pyproj\chess\chess-alpha-zero\C0uci.bat 2018-02-13 17:39:43.7672Child Process Prio Adj: PID 5160 conhost.exe 2018-02-13 17:39:43.7692Child Process Prio Adj: PID 4860 python.exe 2018-02-13 17:39:43.7692Engine 2 ProcessID: 1864 2018-02-13 17:39:43.7692Engine 2 Prio:32 ThreadPrio:0 2018-02-13 17:39:43.769-->2:xboard 2018-02-13 17:39:43.790<--2:E:\programming\pyproj\chess\chess-alpha-zero>python src/chess_zero/run.py uci 2018-02-13 17:39:43.790-->2:uci 2018-02-13 17:39:43.999<--2:id name ChessZero 2018-02-13 17:39:43.999<--2:id author ChessZero 2018-02-13 17:39:43.999<--2:uciok 2018-02-13 17:39:44.0042Child Process Prio Adj: PID 5160 conhost.exe 2018-02-13 17:39:44.0052Child Process Prio Adj: PID 4860 python.exe 2018-02-13 17:39:44.005-->2:isready 2018-02-13 17:39:49.611<--2:Using TensorFlow backend. 2018-02-13 17:39:49.614<--2:Traceback (most recent call last): 2018-02-13 17:39:49.614<--2: File "src/chess_zero/run.py", line 20, in 2018-02-13 17:39:49.614<--2: manager.start() 2018-02-13 17:39:49.614<--2: File "src\chess_zero\manager.py", line 76, in start 2018-02-13 17:39:49.614<--2: return uci.start(config) 2018-02-13 17:39:49.614<--2: File "src\chess_zero\play_game\uci.py", line 31, in start 2018-02-13 17:39:49.614<--2: me_player = get_player(config) 2018-02-13 17:39:49.614<--2: File "src\chess_zero\play_game\uci.py", line 67, in get_player 2018-02-13 17:39:49.614<--2: if not load_best_model_weight(model): 2018-02-13 17:39:49.614<--2: File "src\chess_zero\lib\model_helper.py", line 15, in load_best_model_weight 2018-02-13 17:39:49.614<--2: return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path) 2018-02-13 17:39:49.614<--2: File "src\chess_zero\agent\model_chess.py", line 146, in load 2018-02-13 17:39:49.614<--2: self.model.load_weights(weight_path) 2018-02-13 17:39:49.614<--2: File "C:\Users\User\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2638, in load_weights 2018-02-13 17:39:49.614<--2: f = h5py.File(filepath, mode='r') 2018-02-13 17:39:49.614<--2: File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 271, in init 2018-02-13 17:39:49.614<--2: fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) 2018-02-13 17:39:49.614<--2: File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 101, in make_fid 2018-02-13 17:39:49.614<--2: fid = h5f.open(name, flags, fapl=fapl) 2018-02-13 17:39:49.614<--2: File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper 2018-02-13 17:39:49.615<--2: File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper 2018-02-13 17:39:49.615<--2: File "h5py\h5f.pyx", line 78, in h5py.h5f.open 2018-02-13 17:39:49.615<--2:OSError: Unable to open file (File signature not found) 2018-02-13 17:39:50.063<--2:2018-02-13 17:39:46.346492: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-02-13 17:39:50.063<--2:2018-02-13 17:39:46.853704: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105] Found device 0 with properties: 2018-02-13 17:39:50.063<--2:name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 2018-02-13 17:39:50.063<--2:pciBusID: 0000:22:00.0 2018-02-13 17:39:50.063<--2:totalMemory: 4.00GiB freeMemory: 3.30GiB 2018-02-13 17:39:50.063<--2:2018-02-13 17:39:46.853744: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:22:00.0, compute capability: 6.1) 2018-02-13 17:39:50.5431Incorrect function 2018-02-13 17:39:50.5431Engine crashed, restarting... 2018-02-13 17:39:50.543-->1:quit 2018-02-13 17:39:50.8981--------------------------Starting engine 1 C0uci--------------------------- 2018-02-13 17:39:50.8981Configured Engine 1 Type: Auto 2018-02-13 17:39:50.8981Engine 1 dir: E:\programming\pyproj\chess\chess-alpha-zero 2018-02-13 17:39:50.8981Engine 1 commandline: E:\programming\pyproj\chess\chess-alpha-zero\C0uci.bat 2018-02-13 17:39:51.0091Child Process Prio Adj: PID 11204 conhost.exe 2018-02-13 17:39:51.0101Child Process Prio Adj: PID 5280 python.exe 2018-02-13 17:39:51.0101Engine 1 ProcessID: 10492 2018-02-13 17:39:51.0101Engine 1 Prio:32 ThreadPrio:0 2018-02-13 17:39:51.010-->1:xboard 2018-02-13 17:39:51.031<--1:E:\programming\pyproj\chess\chess-alpha-zero>python src/chess_zero/run.py uci 2018-02-13 17:39:51.031-->1:uci 2018-02-13 17:39:51.239<--1:id name ChessZero 2018-02-13 17:39:51.239<--1:id author ChessZero 2018-02-13 17:39:51.239<--1:uciok 2018-02-13 17:39:51.2431Child Process Prio Adj: PID 11204 conhost.exe 2018-02-13 17:39:51.2441Child Process Prio Adj: PID 5280 python.exe 2018-02-13 17:39:51.244-->1:isready 2018-02-13 17:39:56.839<--1:Using TensorFlow backend. 2018-02-13 17:39:56.841<--1:Traceback (most recent call last): 2018-02-13 17:39:56.841<--1: File "src/chess_zero/run.py", line 20, in 2018-02-13 17:39:56.841<--1: manager.start() 2018-02-13 17:39:56.841<--1: File "src\chess_zero\manager.py", line 76, in start 2018-02-13 17:39:56.841<--1: return uci.start(config) 2018-02-13 17:39:56.841<--1: File "src\chess_zero\play_game\uci.py", line 31, in start 2018-02-13 17:39:56.841<--1: me_player = get_player(config) 2018-02-13 17:39:56.842<--1: File "src\chess_zero\play_game\uci.py", line 67, in get_player 2018-02-13 17:39:56.842<--1: if not load_best_model_weight(model): 2018-02-13 17:39:56.842<--1: File "src\chess_zero\lib\model_helper.py", line 15, in load_best_model_weight 2018-02-13 17:39:56.842<--1: return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path) 2018-02-13 17:39:56.842<--1: File "src\chess_zero\agent\model_chess.py", line 146, in load 2018-02-13 17:39:56.842<--1: self.model.load_weights(weight_path) 2018-02-13 17:39:56.842<--1: File "C:\Users\User\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2638, in load_weights 2018-02-13 17:39:56.842<--1: f = h5py.File(filepath, mode='r') 2018-02-13 17:39:56.842<--1: File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 271, in init 2018-02-13 17:39:56.842<--1: fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) 2018-02-13 17:39:56.842<--1: File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 101, in make_fid 2018-02-13 17:39:56.842<--1: fid = h5f.open(name, flags, fapl=fapl) 2018-02-13 17:39:56.842<--1: File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper 2018-02-13 17:39:56.842<--1: File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper 2018-02-13 17:39:56.842<--1: File "h5py\h5f.pyx", line 78, in h5py.h5f.open 2018-02-13 17:39:56.842<--1:OSError: Unable to open file (File signature not found) 2018-02-13 17:39:57.291<--1:2018-02-13 17:39:53.569328: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-02-13 17:39:57.291<--1:2018-02-13 17:39:54.090595: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105] Found device 0 with properties: 2018-02-13 17:39:57.291<--1:name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 2018-02-13 17:39:57.291<--1:pciBusID: 0000:22:00.0 2018-02-13 17:39:57.291<--1:totalMemory: 4.00GiB freeMemory: 3.30GiB 2018-02-13 17:39:57.291<--1:2018-02-13 17:39:54.090630: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:22:00.0, compute capability: 6.1) 2018-02-13 17:39:58.5881Start calc, move no: 0 2018-02-13 17:39:58.588-->1:ucinewgame 2018-02-13 17:39:58.588-->1:isready 2018-02-13 17:39:58.698-->1:position startpos 2018-02-13 17:39:58.698-->1:go wtime 300000 btime 300000 winc 0 binc 0

Philip

brianprichardson commented 6 years ago

It looks like it is trying to load the weights file. What happens when you just run C0uci.bat from a terminal or cmd (no Arena, no --distributed)?

philipstephens commented 6 years ago

When I enter C0ucu.bat it just sits there and does nothing until I type ctrl-c . It looks as if it is waiting for keyboard input, but I don't know what parameters it is looking for.

Traceback (most recent call last): File "src/chess_zero/run.py", line 20, in manager.start() File "src\chess_zero\manager.py", line 76, in start return uci.start(config) File "src\chess_zero\play_game\uci.py", line 23, in start line = input() KeyboardInterrupt Terminate batch job (Y/N)? N

E:\programming\pyproj\chess\chess-alpha-zero>

brianprichardson commented 6 years ago

You may not be familiar with the chess UCI protocol and commands. First, you should type 'uci' and the engine will respond with its name and author. Then, 'isready' and wait for the 'readyok' response. Next, 'go depth 1' and wait. A move like d2d4 should be output.

philipstephens commented 6 years ago

E:\programming\pyproj\chess\chess-alpha-zero>python src/chess_zero/run.py uci isready Using TensorFlow backend. 2018-02-14 14:29:39.396510: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-02-14 14:29:39.867269: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105] Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 pciBusID: 0000:22:00.0 totalMemory: 4.00GiB freeMemory: 3.30GiB 2018-02-14 14:29:39.873405: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:22:00.0, compute capability: 6.1) readyok go depth 1 bestmove e2e4 engine.name go depth 1 bestmove e2e4 bestmove e7e5 go bestmove e2e4 quit

philipstephens commented 6 years ago

Oops, was working again but then I typed: python src/chess_zero/run.py self --type distributed with errors and now

E:\programming\pyproj\chess\chess-alpha-zero>python src\chess_zero\run.py uci isready Using TensorFlow backend. 2018-02-14 15:16:33.145562: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-02-14 15:16:33.623236: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105] Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 pciBusID: 0000:22:00.0 totalMemory: 4.00GiB freeMemory: 3.30GiB 2018-02-14 15:16:33.631631: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:22:00.0, compute capability: 6.1) Traceback (most recent call last): File "src\chess_zero\run.py", line 20, in manager.start() File "src\chess_zero\manager.py", line 76, in start return uci.start(config) File "src\chess_zero\play_game\uci.py", line 31, in start me_player = get_player(config) File "src\chess_zero\play_game\uci.py", line 67, in get_player if not load_best_model_weight(model): File "src\chess_zero\lib\model_helper.py", line 15, in load_best_model_weight return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path) File "src\chess_zero\agent\model_chess.py", line 146, in load self.model.load_weights(weight_path) File "C:\Users\User\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2638, in load_weights f = h5py.File(filepath, mode='r') File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 271, in init fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "C:\Users\User\Anaconda3\lib\site-packages\h5py_hl\files.py", line 101, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py\h5f.pyx", line 78, in h5py.h5f.open OSError: Unable to open file (File signature not found)

E:\programming\pyproj\chess\chess-alpha-zero>

philipstephens commented 6 years ago

It's working with self play now (without the distributed option). I am waiting for it to finish playing a series of games before I try the distributed version. I copied the model_best_weight.h5 and the model_best and the model_best_config.json from a copy of chess zero and then ran src\chess_zero\run.py self.

philipstephens commented 6 years ago

I've been able to reproduce the problem. Whenever I run

E:\programming\pyproj\chess\chess-alpha-zero>python src\chess_zero\run.py self --type distributed

I get

2018-02-19 09:55:53,511@chess_zero.manager INFO # config type: distributed ... File "h5py\h5f.pyx", line 78, in h5py.h5f.open OSError: Unable to open file (File signature not found)

and I then get the same error when I run E:\programming\pyproj\chess\chess-alpha-zero>python src\chess_zero\run.py self

When I look at the model directory I get 2018-02-19 09:55 AM

.. 2018-02-19 09:55 AM 29,719 model_best_config.json 2018-02-19 09:55 AM 0 model_best_weight.h5

but when I delete model_best_config.json and model_best_weight.h5 with a previously saved copy,

E:\programming\pyproj\chess\chess-alpha-zero>python src\chess_zero\run.py self works again, but E:\programming\pyproj\chess\chess-alpha-zero>python src\chess_zero\run.py self --type distributed does not and never seems to work. Do you need any other info to debug this problem?

Thanks, Philip

brianprichardson commented 6 years ago

This might not be an active project, so the distributed option may not apply. I suggest just running without it.

philipstephens commented 6 years ago

Thanks.