Open fredzfm opened 6 years ago
It has been a while, but for a "self" play run try without any weight file and it should create one to start with. The best weights are for "uci" play.
Thanks brianprichardson.
You mean it never runs with GPU? I have tried "self" without any Json model and weight file, still got error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
Depending on the situiation, the weights (.h5) and model (.json) files must match the net architecture in the configs file (typically mini.py). The stronger ones that I uploaded do not match the current config files.
IIRC, when running "self" if there are no .h5 and .json files they will be created first. You can add self.model.summary() at the end of the def build(self): in class ChessModel: in model_chess.py in the agent dir to see if it is creating a new model from the specs in the mini.py file.
For running "uci" it just tries to read the best files. Other params in the config file can still be set, but most are ignored for uci, like playouts is 1,200 (sort of like fixed number of nodes).
The first output you posted shows it is trying to run with the gpu. As slow as it is, it will be far to slow to run without a gpu, and your 1080ti is a very good one.
I would try a clean download and just try to run with "uci" and enter the "uci" and "isready" (remember to wait for the readyok), and then "go". You should get a bestmove after some time. If that works, then your packages and gpu are all working ok and we can work from there.
What are you trying to do, in general? Self-play training is extremely slow and takes a lot of disk space for the intermediate input plane files. That's why I have a tweaked version that takes pgn input and trains directly from that.
This issue might be related to #75 and #76 .
I would try a clean download and just try to run with "uci" and enter the "uci" and "isready" (remember to wait for the readyok), and then "go". You should get a bestmove after some time. If that works, then your packages and gpu are all working ok and we can work from there.
What is the command to run this? python src/chess_zero/run.py uci --isready does not work.
First only do: python src/chess_zero/run.py uci
Then, after it loads enter: uci [wait for uciok] isready [wait for readyok] go [should see some bestmove output but may take some time with cpu and gpu busy]
I get the following error logs when I issue isready
Using TensorFlow backend.
2018-11-11 18:52:28.546655: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-11-11 18:52:28.667415: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-11 18:52:28.667979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1070 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.2655
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.51GiB
2018-11-11 18:52:28.667993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2018-11-11 18:52:29.920908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-11 18:52:29.920943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2018-11-11 18:52:29.920952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2018-11-11 18:52:29.921135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7243 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1626, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/chess_zero/run.py", line 20, in <module>
manager.start()
File "src/chess_zero/manager.py", line 76, in start
return uci.start(config)
File "src/chess_zero/play_game/uci.py", line 31, in start
me_player = get_player(config)
File "src/chess_zero/play_game/uci.py", line 67, in get_player
if not load_best_model_weight(model):
File "src/chess_zero/lib/model_helper.py", line 15, in load_best_model_weight
return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path)
File "src/chess_zero/agent/model_chess.py", line 145, in load
self.model = Model.from_config(json.load(f))
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 1032, in from_config
process_node(layer, node_data)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/layers/normalization.py", line 206, in call
training=training)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3123, in in_train_phase
x = switch(training, x, alt)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3058, in switch
else_expression_fn)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2087, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1920, in BuildCondBranch
original_result = fn()
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/layers/normalization.py", line 167, in normalize_inference
epsilon=self.epsilon)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 1908, in batch_normalization
mean = tf.reshape(mean, (-1))
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6296, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1790, in __init__
control_input_ops)
File "/home/pratyush/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1629, in _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
I got the same errors: ValueError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
See #75 there is a link a fork with a working version.
Tried to run it with GPU. got the following error. can anyone help me on this?
(Python36) D:\chess\chess-alpha-zero>python src/chess_zero/run.py self 2018-10-10 11:45:55,139@chess_zero.manager INFO # config type: mini Using TensorFlow backend. 2018-10-10 11:45:59,436@chess_zero.agent.model_chess DEBUG # loading model from D:\chess\chess-alpha-zero\data\model\model_best_config.json 2018-10-10 11:45:59.478648: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-10-10 11:45:59.695745: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:01:00.0 totalMemory: 11.00GiB freeMemory: 9.10GiB 2018-10-10 11:45:59.790370: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:02:00.0 totalMemory: 11.00GiB freeMemory: 9.10GiB 2018-10-10 11:45:59.795932: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0, 1 2018-10-10 11:48:20.448740: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-10 11:48:20.451530: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0 1 2018-10-10 11:48:20.453788: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N N 2018-10-10 11:48:20.455816: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 1: N N 2018-10-10 11:48:20.458363: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8795 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-10-10 11:48:20.834375: I C:\users\nwani_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8795 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) Traceback (most recent call last): File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1567, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "src/chess_zero/run.py", line 20, in
manager.start()
File "src\chess_zero\manager.py", line 64, in start
return self_play.start(config)
File "src\chess_zero\worker\self_play.py", line 25, in start
return SelfPlayWorker(config).start()
File "src\chess_zero\worker\self_play.py", line 45, in init
self.current_model = self.load_model()
File "src\chess_zero\worker\self_play.py", line 85, in load_model
if self.config.opts.new or not load_best_model_weight(model):
File "src\chess_zero\lib\model_helper.py", line 15, in load_best_model_weight
return model.load(model.config.resource.model_best_config_path, model.config.resource.model_best_weight_path)
File "src\chess_zero\agent\model_chess.py", line 145, in load
self.model = Model.from_config(json.load(f))
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\engine\network.py", line 1032, in from_config
process_node(layer, node_data)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\engine\network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), kwargs)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\engine\base_layer.py", line 457, in call
output = self.call(inputs, kwargs)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\layers\normalization.py", line 206, in call
training=training)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 3123, in in_train_phase
x = switch(training, x, alt)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 3058, in switch
else_expression_fn)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2072, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1913, in BuildCondBranch
original_result = fn()
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\layers\normalization.py", line 167, in normalize_inference
epsilon=self.epsilon)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 1908, in batch_normalization
mean = tf.reshape(mean, (-1))
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 6112, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1734, in init
control_input_ops)
File "C:\Users\isszfm\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1570, in _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'input_batchnorm/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,256,1,1], [].