google-deepmind / learning-to-learn

Learning to Learn in TensorFlow
https://arxiv.org/abs/1606.04474
Apache License 2.0
4.06k stars 598 forks source link

Resetting each epoch? #18

Open gabrielleyr opened 7 years ago

gabrielleyr commented 7 years ago

Using Adam optimizer, not L2L for the CIFAR problem: If I print the cost after each epoch, it doesn't decrease over time running with learning rate .001, num_steps 100 num_epochs 100. However, printing the cost for each num_step, it decreases within the epoch. Why does it seem like the weights are being reset each epoch?

I've also added code to check the training and validation accuracy after each epoch. These are also not decreasing with each epoch.

gabrielleyr commented 7 years ago

If I comment out line 4 from the original code, sess.run(reset), I get the expected results of decreasing error every epoch. Is it incorrect not to reset the problem_reset and optimizer_reset each epoch?

def run_epoch(sess, cost_op, ops, reset, num_unrolls): """Runs one optimization epoch.""" start = timer()

sess.run(reset) ## from evaluate.py: reset = [problem_reset, optimizer_reset]

for _ in xrange(num_unrolls): cost = sess.run([cost_op] + ops)[0] print(cost) return timer() - start, cost

muneebshahid commented 7 years ago

Ok so in their paper they only made the lstm learn to optimize for a fixed number of time steps i.e 100 iterations. So for them one epoch is training the lstm to learn to optimize for 100 steps. After the 100 steps or one epoch they reset and start over. In their paper they claim that for the number of steps the lstm is trained to optimize for, it gets better performance than Adam.

Now to your query, yes they reset Adam but always after 100 time steps. i.e number of optimization per epoch. After 100 epochs (each with 100 iterations) they average the loss which is then used to compare against the lstm. I hope this makes things clear.

gabrielleyr commented 7 years ago

@muneebshahid In the above code, the sess.run(reset) is inside the run_epoch function, so that reset would happen every 1 epoch, not every 100. Do you agree?

muneebshahid commented 7 years ago

yes, after every epoch they reset. But in each epoch there are multiple optimization steps. i.e num_rolls