player-deep-learning-train.py doesn't work.

Hello. We are trying to run an example code in the given repo for robots. Most codes with rule-based methods works. We have tested that player_rulebased-(tc).py, player_random-walk.py, and player_skeleton.py are working well. However, only player-deep-learning-train.py does not work :(

We checked whether tensorflow-gpu and cudatoolkit are installed in our python environment, and also there's no error message about cuda library and its path. All of three members in our team got same trouble. Here's the console message about this issue.

INFO: supervisor: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/supervisor/supervisor"
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
INFO: soccer_robot: Starting controller: "/home/aiwc2019/aiwc/test_world/controllers/soccer_robot/soccer_robot" 0.05
[supervisor] creating uds connection
[supervisor] destroying uds connection
[supervisor] Closed the uds socket
[supervisor] server running
[supervisor] Rules:
[supervisor]      game duration - 300 seconds
[supervisor]           deadlock - on
[supervisor] Team A: 
[supervisor]   team name - teamA
[supervisor]  executable - examples/player_deep-learning-train_py/player_deep-learning-train.py
[supervisor]   data path - examples/team_a_data
[supervisor] 
[supervisor] Team B: 
[supervisor]   team name - teamB
[supervisor]  executable - examples/player_rulebased-B_py/player_rulebased-B.py
[supervisor]   data path - examples/team_b_data
[supervisor] 
[supervisor] Commentator: 
[supervisor]   team name - commentator
[supervisor]  executable - examples/commentator_skeleton_py/commentator_skeleton.py
[supervisor]   data path - examples/commentator_data
[supervisor] 
[supervisor] Reporter: 
[supervisor]   team name - reporter
[supervisor]  executable - examples/reporter_skeleton_py/reporter_skeleton.py
[supervisor]   data path - examples/reporter_data
[supervisor] 
[supervisor] 2019-09-10T13:29:22+0900 I am the commentator for this game!
[supervisor] 2019-09-10T13:29:22+0900 I am the reporter for this game!
[supervisor] 2019-09-10T13:29:22+0900 I am ready for the game!
INFO: 'supervisor' controller exited successfully.

So, could I ask you for checking whether this code works (or not)? Thanks.

Hello @kyumaze, it is true. It is not working. I guess something was changed in tensorflow since the development of the example.

I was able to make it work changing:

player_deep-learning-train.py

add:

import tensorflow as tf

add: (around line 170)

            self.sess = tf.Session()
            self.Q = NeuralNetwork(self.sess, "Q", None, False, False) # 2nd term: False to start training from scratch, use CHECKPOINT to load a checkpoint
            self.Q_ = NeuralNetwork(self.sess, "Q_", self.Q, False, True)

dqn_nn.py: (add two variables sess and name; create a name scope for the architecture)

class NeuralNetwork:
    def __init__(self, sess, name, NetworkToCopy = None, RestoreFromFile = False, ReplayNetwork = False):
        self.frame_res = 5     # Resolution of the input
        self.nframes = 1       # Number of frames/channels of the input
        NumberOfActions = 11    # Number of possible actions
        learning_rate = 1e-4   # Learning Rate

        # Placeholders for the input variables
        #self.sess = tf.compat.v1.InteractiveSession()
        self.sess = sess
        self.name = name
        self.phy = tf.placeholder(tf.float32, shape=[None, self.frame_res*self.nframes]) # Flattened last nf frames of the game
        self.y = tf.placeholder(tf.float32, shape=[None, 1])
        self.action = tf.placeholder(tf.float32, shape=[None, NumberOfActions])

        with tf.variable_scope(self.name):

                        # Fully-connected layer 1
                        input_f1 = self.phy
                        fc1_num_neurons = 256
                        self.W_fc1 = tf.Variable(tf.random.truncated_normal([self.frame_res*self.nframes, fc1_num_neurons], stddev=0.01), name="W_fc1")
                        self.b_fc1 = tf.Variable(tf.constant(0.01, shape=[fc1_num_neurons]), name="b_fc1")
                        self.input_fc1_flat = tf.reshape(input_f1, [-1, self.frame_res*self.nframes])
                        self.out_fc1 = tf.nn.relu(tf.matmul(self.input_fc1_flat, self.W_fc1) + self.b_fc1)

                        # Fully-connected layer 2
                        fc2_num_neurons = 256
                        self.W_fc2 = tf.Variable(tf.random.truncated_normal([fc1_num_neurons, fc2_num_neurons], stddev=0.01), name="W_fc2")
                        self.b_fc2 = tf.Variable(tf.constant(0.01, shape=[fc2_num_neurons]), name="b_fc2")
                        self.out_fc2 = tf.nn.relu(tf.matmul(self.out_fc1, self.W_fc2) + self.b_fc2)

                        # Output layer
                        out_layer1_input = self.out_fc2
                        out_layer1_n_outputs = NumberOfActions
                        self.W_ol1 = tf.Variable(tf.random.truncated_normal([fc2_num_neurons, out_layer1_n_outputs], stddev=0.01), name="W_ol1")
                        self.b_ol1 = tf.Variable(tf.constant(0.01, shape=[out_layer1_n_outputs]), name="b_ol1")
                        self.Q_theta = tf.matmul(out_layer1_input, self.W_ol1) + self.b_ol1
                        self.Q_action = tf.reduce_sum(tf.multiply(self.Q_theta, self.action),  reduction_indices=1, keepdims=True)

Please check if you are able to run the example with these changes.

Regards.

aiwc / test_world

player-deep-learning-train.py doesn't work. #60