GoogleCloudPlatform / tensorflow-without-a-phd

A crash course in six episodes for software developers who want to become machine learning practitioners.
Apache License 2.0
2.78k stars 907 forks source link

Problem with shareable variables. with tf.variable_scope('model', reuse=tf.AUTO_REUSE) #20

Open yanchvlad opened 6 years ago

yanchvlad commented 6 years ago

In my network rollout of next epoch dosen't use trained weights of prev train operation. And I see in tensorboard that rollout and train graph have seperate 'model' and layers with different names (for ex. dense_1, dense_0, dense_2, dence_3).

Where is a problem? I slightly changed code:

`

def build_graph(observations):

with tf.variable_scope('model', reuse=tf.AUTO_REUSE) as model:

    lstm=tf.keras.layers.LSTM(100, return_sequences=True, stateful=False, use_bias=True)(observations)
    lstm2=tf.keras.layers.LSTM(64, return_sequences=True, stateful=False, use_bias=True, dropout=0.2)(lstm)
    lstm3=tf.keras.layers.LSTM(64, return_sequences=True, stateful=False, use_bias=True)(lstm2)
    lstm7=tf.keras.layers.LSTM(32, stateful=False, use_bias=True, dropout=0.2)(lstm3)
    #hidden = tf.keras.layers.Dense(50, use_bias=True, activation='relu')(lstm2)
    logits = tf.keras.layers.Dense(len(ACTIONS), 
                                   #bias_initializer=tf.constant_initializer(value=[7.,0.1,0.1]), 
                                   use_bias=True)(lstm7)

return logits 

def main(args):
    args_dict = vars(args)
    print('args: {}'.format(args_dict))

    with tf.Graph().as_default() as g:
        # rollout subgraph

        with tf.device('/cpu:0'):
            with tf.name_scope('rollout'):

                observations = tf.placeholder(shape=(args.batch_size, args.sequence_size, OBSERVATION_DIM), dtype=tf.float32)

                logits = build_graph(observations)

                logits_for_sampling = tf.reshape(logits, shape=(args.batch_size, len(ACTIONS)))

                # Sample the action to be played during rollout.

                sample_action = tf.squeeze(tf.multinomial(logits=logits_for_sampling, num_samples=1))

            optimizer = tf.train.RMSPropOptimizer(
                learning_rate=args.learning_rate,
                decay=args.decay
            )

        # dataset subgraph for experience replay
        with tf.name_scope('dataset'):
            # the dataset reads from MEMORY

            ds = tf.data.Dataset.from_generator(gen, output_types=(tf.float32, tf.int64, tf.float32))
            iterator = ds.make_one_shot_iterator()

        # training subgraph
        with tf.name_scope('train'):
            # the train_op includes getting a batch of data from the dataset, so we do not need to use a feed_dict when running the train_op.
            next_batch = iterator.get_next()

            global episode
            train_observations, labels, processed_rewards = next_batch
            episode=next_batch

            # This reuses the same weights in the rollout phase.
            train_observations.set_shape((args.batch_size, args.sequence_size, OBSERVATION_DIM))
            train_logits = build_graph(train_observations)

            cross_entropies = tf.nn.sparse_softmax_cross_entropy_with_logits(
                logits=train_logits,
                labels=labels
            )

            loss = tf.reduce_sum(processed_rewards * cross_entropies)

            global_step = tf.train.get_or_create_global_step()

            train_op = optimizer.minimize(loss, global_step=global_step)

        init = tf.global_variables_initializer()
        saver = tf.train.Saver(max_to_keep=args.max_to_keep)

`

martin-gorner commented 6 years ago

I guess you are talking about the reinforcement learning sample. What exactly did you modify in the code ? Or are you saying that the code as it is on GitHub does not work ?

yanchvlad commented 6 years ago

@martin-gorner Sorry about misunderstandings. Yes, I'm talking about reinforcement learning sample. I found out that non-changed sample doesn't work. It happens because of different models are created by name_scope('train') and name_scope('roll_out'). The trained weights are not used in next roll-out operation when train operation is over. And all roll-out operations will always compute actions from non-trained NN. As I found out it happened because of build_graph(observations) call from different name_scopes. When I united these two name_scopes in one, everything worked as it could be (weights were shared and reused).

python 3.6 tensorflow 1.10.0 GPU ver

dizcology commented 6 years ago

@yanchvlad Thanks for reporting the issue. This is a known issue that happened between TensorFlow versions 1.8 and 1.9, where the reuse behavior is different for tf.keras models.

For now my suggestion would be either of the following:

a. use TensorFlow version 1.8

or

b. rewrite the build_graph function to not use tf.keras.layers.

martin-gorner commented 6 years ago

@yanchvlad if you make the changes before we do please send a pull req!