Open stef277 opened 6 years ago
@Akababa, please can you review this? May it be caused by using Mac OS?
I've set max_process to 1 in configs/mini.py, and it's now running. It takes about 280-620 s. per games (depending on the number of moves). I guess it's going to run for 50 games. Time to go to bed. Maybe you are right and it's because of some way my Mac is configured. (Also, I'm using virtualenv, but I don't think it matters)
Does it help to change start method to spawn?
It is already set for 'spawn' in main in run.py. I've tried 'fork', same result. I will look at it eventually, it seems to crash when finishing processing the game and sending result to the main process. Not a big deal for now, I guess having only one process is sufficient for now.
Sorry I couldn't test it, I only have a windows laptop. Please let us know if you figure it out!
On another note: without tensorflow-gpu how much cpu % does the one process use?
I have 8 cores. So 4 cores are at 100%, the other 4 are around 80%. And my fan on my laptop is screaming! How long does it take you to complete 1 game?
I have a modest GPU (GT 750M) so a total of about one game per minute on 3 processes. Interestingly my CPU usage is only around 30-40%.
Hi,
I just cloned the repository (a few hours ago), and I ran into a crash while trying to do self-play using the best model coming with the source code. It crashes after a while (after a few minutes), it looks like the different process or threads have problems communicating with each other using pipes. I will look at it tomorrow, once I will start exploring the code a bit more. You will find my config and the stack trace below.
My config:
(venv) 2015sys0736:chess-alpha-zero stephane$ python src/chess_zero/run.py self 2017-12-24 23:55:56,014@chess_zero.manager INFO # config type: {config_type} Using TensorFlow backend. /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, kwds) 2017-12-24 23:55:57,195@chess_zero.agent.model_chess DEBUG # loading model from /Users/stephane/Documents/Dev/chess/chess-alpha-zero/data/model/model_best_config.json 2017-12-24 23:55:57.237053: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2017-12-24 23:56:00,410@chess_zero.agent.model_chess DEBUG # loaded model digest = 0c379712fcb4204eccea535e5ff099cde78f87037e9805c85d4738bc350adb12 Using TensorFlow backend. Using TensorFlow backend. Using TensorFlow backend. /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, *kwds) /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(args, kwds) /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, *kwds) Exception in thread prediction_worker: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864, in run self._target(self._args, **self._kwargs) File "src/chess_zero/agent/api_chess.py", line 33, in predict_batch_worker data.append(pipe.recv()) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError
concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "src/chess_zero/worker/self_play.py", line 87, in self_play_buffer pipes = cur.pop() # borrow File "", line 2, in pop
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod
kind, result = conn.recv()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 951, in rebuild_connection
fd = df.detach()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "src/chess_zero/run.py", line 16, in
manager.start()
File "src/chess_zero/manager.py", line 46, in start
return self_play.start(config)
File "src/chess_zero/worker/self_play.py", line 22, in start
return SelfPlayWorker(config).start()
File "src/chess_zero/worker/self_play.py", line 47, in start
env, data = futures.popleft().result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.get_result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in get_result
raise self._exception
ConnectionRefusedError: [Errno 61] Connection refused