alsora / deep-briscola

Tensorflow Deep Reinforcement Learning agents playing Briscola card game
21 stars 6 forks source link

Full of Errors #20

Open 1337micro opened 1 year ago

1337micro commented 1 year ago

Hi there, interesting project. Thanks for the effort you put into this.

I'm not sure if you're still interested in maintaining it, but I am struggling quite a bit to use it.

I have followed your instructions in the README using the pip install (not docker). When I try to run the train command:

python3 train.py --network dqn --saved_model saved_model_dir

I run into numerous errors:

  1. There is no such option --saved_model I believe you meant --model_dir
  2. I think you used tensorflow version 1, but that version is no longer available on pip and I don't know how else to get it on ubuntu.

When I try to run it on the latest version of tensorflow, I get an error "[AttributeError: module 'tensorflow' has no attribute 'app'] which I was able to solve in my fork by changing the tensorflow import to: import tensorflow.compat.v1 as tf https://github.com/1337micro/deep-briscola

I then was able to run the command to start the training: python3 train.py --network dqn --model_dir saved_model_dir

The training worked, but there are full of errors and the saving part the end failed. Here's the output:

bluezone@DESKTOP-7T8A4VL:~/deep-briscola$ python3 train.py --network dqn --model_dir mod 2023-01-31 23:13:29.049476: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-01-31 23:13:29.050890: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-01-31 23:13:30.155620: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-01-31 23:13:30.157471: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-01-31 23:13:30.157936: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-01-31 23:13:31.971426: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2023-01-31 23:13:31.973196: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303) 2023-01-31 23:13:31.973899: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-7T8A4VL): /proc/driver/nvidia/version does not exist /home/bluezone/deep-briscola/networks/dqn.py:97: UserWarning: tf.layers.dense is deprecated and will be removed in a future version. Please use tf.keras.layers.Dense instead. last_tensor = tf.layers.dense(last_tensor, layer_size, tf.nn.relu, kernel_initializer=w_initializer, /home/bluezone/deep-briscola/networks/dqn.py:100: UserWarning: tf.layers.dense is deprecated and will be removed in a future version. Please use tf.keras.layers.Dense instead. self.q = tf.layers.dense(last_tensor, self.n_actions, kernel_initializer=w_initializer, /home/bluezone/deep-briscola/networks/dqn.py:107: UserWarning: tf.layers.dense is deprecated and will be removed in a future version. Please use tf.keras.layers.Dense instead. last_tensor = tf.layers.dense(last_tensor, layer_size, tf.nn.relu, kernel_initializer=w_initializer, /home/bluezone/deep-briscola/networks/dqn.py:110: UserWarning: tf.layers.dense is deprecated and will be removed in a future version. Please use tf.keras.layers.Dense instead. self.q_next = tf.layers.dense(last_tensor, self.n_actions, kernel_initializer=w_initializer, 2023-01-31 23:13:32.357391: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled Epoch: 1000 Total wins: [299, 201] QAgent 0 won 59.80% with average points 65.35 RandomAgent 1 won 40.20% with average points 54.65 Traceback (most recent call last): File "/home/bluezone/deep-briscola/train.py", line 99, in tf.app.run() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/home/bluezone/deep-briscola/train.py", line 60, in main train(game, agents, FLAGS.num_epochs, FLAGS.evaluate_every, FLAGS.num_evaluations, FLAGS.model_dir) File "/home/bluezone/deep-briscola/train.py", line 32, in train agents[0].save_model(model_dir) File "/home/bluezone/deep-briscola/agents/q_agent.py", line 135, in save_model self.q_learning.save_model(output_dir) File "/home/bluezone/deep-briscola/networks/base_network.py", line 36, in save_model self.saver.save(self.session, './' + output_dir + '/') File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saver.py", line 1280, in save self._build_eager( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saver.py", line 946, in _build_eager self._build( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saver.py", line 971, in _build self.saver_def = self._builder._build_internal( # pylint: disable=protected-access File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saver.py", line 514, in _build_internal saveables = saveable_object_util.validate_and_slice_inputs( File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saving/saveable_object_util.py", line 371, in validate_and_slice_inputs for converted_saveable_object in saveable_objects_for_op(op, name): File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saving/saveable_object_util.py", line 222, in saveable_objects_for_op raise ValueError("Can only save/restore ResourceVariables when " ValueError: Can only save/restore ResourceVariables when executing eagerly, got type: <class 'tensorflow.python.framework.ops.Tensor'>.

Let me know if you're interested in helping me out. I can try to find a way to get tensorflow v1 in the meantime.

alsora commented 1 year ago

This project was developed 4 years ago using an older version of Tensorflow; you will have to fix several of these errors. Maybe, using the provided Docker image you may have better results.

Feel free to open a PR with the fixes that you develop.