dyelax / Adversarial_Video_Generation

A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.
MIT License
734 stars 184 forks source link

Question about discriminator and backprop #2

Open 3rd3 opened 8 years ago

3rd3 commented 8 years ago

First of all, thanks for sharing this fantastic and clean code! I'm having trouble understanding this part of your code: Here you run the discriminator on the predicted frames to get the real/fake predictions per frame. You then pass these via the d_scale_preds placeholders back to the generator and finally you regress the generator to bring d_scale_preds closer to a tensor of ones. What I am wondering is how the gradients are backpropagated from the discriminator back to the generator. Can the gradients pass through sess.run statements?

dyelax commented 7 years ago

Hey @3rd3 – good question. Gradients can't pass through sess.run() statements, but that's fine since we aren't trying to train the discriminator there. We just need its forward pass predictions to use in the loss calculation of the generator. There might be a more efficient way to train both the discriminator and generator in the same pass, but the original paper specified that they used a different batch for the discriminator and generator training steps.

3rd3 commented 7 years ago

Thanks for your answer. Sorry, if I am missing something obvious, but the adversarial loss for the generator is log(1-D(G(z))), so wouldn't I get the derivative of D by chain rule in the gradient? How can TensorFlow do the automatic differentiation if the evaluation of D(G(z)) is a constant fed into the loss?

dyelax commented 7 years ago

I think you might be correct – I misunderstood you originally. I'm going to reopen this and create a new branch to explore that. Feel free to contribute if you have some time!

3rd3 commented 7 years ago

Thanks for your reply. Would that imply that the contribution of the adversarial loss to the gradient is currently always zero? Then I am wondering why the GIFs with adversarial learning still are visually superior, ie. without 'rainbow' artifacts.

dyelax commented 7 years ago

I'm wondering the same thing. Digging through the code right now to try to figure that out.

3rd3 commented 7 years ago

Did you make progress? I'm really curious about whether & how this will improve the predictions! Unfortunately, I don't have enough time to help out.

dyelax commented 7 years ago

I'm working on it in the gradient-bug branch. I'll let you know when it's fixed!

dyelax commented 7 years ago

Hey @3rd3 – I think I have it fixed in the gradient-bug branch. I'm testing right now and tweaking some hyperparameters, but feel free to check it out and let me know if you see anything that's still broken

3rd3 commented 7 years ago

Looks good so far. I am too busy right now to read the code more carefully, but telling from the code I've looked at, I am not sure whether you are instantiating the discriminator model twice. I think this is necessary to prevent the optimizer from training the discriminator via the combined loss as well while training the generator. This can be done by adding a trainable flag to the define_graph function and passing it with a False value to the variable declarations in the w or b functions in case of the instantiation for the generator. During the second construction of the graph, you need variable scopes with the reuse flag being set to True such that the variables are shared between the two instantiations. An alternative and perhaps easier/more streamlined way of achieving the same would be to create a variable collection for the generator variables and then update via opt.minimize(loss, var_list=<list of variables>). You can perhaps also query the variables from the name scope via tf.get_collection(tf.GraphKeys.VARIABLES, scope='my_scope'). Perhaps there are more ways of disabling gradient updates for certain variables or subgraphs that I am not aware of (i.e. something like tf.stop_gradient(input)). The problem I am seeing with the latter approaches is that TF might not allow for reusing a graph at all without instantiating it multiple times and sharing the variables.

3rd3 commented 7 years ago

I've made some changes to my previous message because I hit 'comment' too early. I am not sure whether these changes made it into the email notification.

dyelax commented 7 years ago

@3rd3 – I believe I have that covered. In both the generator and discriminator models, I'm passing minimize() a list of variables to train just the model in question. I'm still having trouble getting this new implementation to perform as well as the previous (incorrect) one though

3rd3 commented 7 years ago

If not a bug, this could be the training difficulties that adversarial training is known for. Perhaps noise helps: http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/