google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.24k stars 933 forks source link

alpha_zero and python #1269

Closed PawBlo closed 1 month ago

PawBlo commented 2 months ago

When I try to run python tic_tac_toe_alpha_zero.py I get this error:

Exception caught in actor-0: Failed call to cuDeviceGet: CUDA_ERROR_NOT_INITIALIZED: initialization error
actor-0 exiting
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/open_spiel/open_spiel/python/algorithms/alpha_zero/alpha_zero.py", line 171, in _watcher
    return fn(config=config, logger=logger, **kwargs)
  File "/home/open_spiel/open_spiel/python/algorithms/alpha_zero/alpha_zero.py", line 268, in actor
    model = _init_model_from_config(config)
  File "/home/open_spiel/open_spiel/python/algorithms/alpha_zero/alpha_zero.py", line 148, in _init_model_from_config
    return model_lib.Model.build_model(
  File "/home/open_spiel/open_spiel/python/algorithms/alpha_zero/model.py", line 174, in build_model
    session = tf.Session(graph=g)
  File "/home/open_spiel/venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1627, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/open_spiel/venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 715, in __init__
    self._session = tf_session.TF_NewSessionRef(c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: Failed call to cuDeviceGet: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-20 12:36:05.329555: E tensorflow/core/common_runtime/session.cc:93] Failed to create session: INTERNAL: Failed call to cuDeviceGet: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-20 12:36:05.329646: E tensorflow/c/c_api.cc:2241] INTERNAL: Failed call to cuDeviceGet: CUDA_ERROR_NOT_INITIALIZED: initialization error
Exception caught in actor-1: Failed call to cuDeviceGet: CUDA_ERROR_NOT_INITIALIZED: initialization error

My env: open_spiel - master Ubuntu 22.04.4 LTS GNU ld (GNU Binutils for Ubuntu) 2.38 Python 3.10.12 Ubuntu clang version 14.0.0-1ubuntu1.1

lanctot commented 2 months ago

Hmm, seems like it's trying to use a GPU on a machine that doesn't have one (or is not setup properly). I see no GPU-specific code in the Python AlphaZero, and AFAIK it should work on CPU.

Did you install a GPU version of Tensorflow?

Are you able to create very simple graphs in your TF and run them without that error, e.g. does the following work for you?

import tensorflow.compat.v1 as tf

with tf.Session() as sess:
  my_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]])
  my_variable = tf.Variable(my_tensor)
  bool_variable = tf.Variable([False, False, False, True])
  complex_variable = tf.Variable([5 + 4j, 6 + 1j])
  sess.run(tf.global_variables_initializer())
lanctot commented 2 months ago

If that simple one works, does the one listed here work?

lanctot commented 2 months ago

Also, did you add a Cuda/GPU-specific code to your local files? (such as e.g. with tf.device("/GPU:0"):)

lanctot commented 1 month ago

Closing due to inactivity, please re-open if you want to follow-up.