dyelax / Adversarial_Video_Generation

A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.
MIT License
735 stars 184 forks source link

how to decide the SCALE_CONV_FMS and SCALE_KERNEL_SIZES #14

Closed bitxsw93 closed 7 years ago

bitxsw93 commented 7 years ago

Hello: If I want to change the number of the input frames and the output frames. for example, given 10 input frames and predict the next 3 frames. then, how to decide the matrix of SCALE_CONV_FMS, SCALE_KERNEL_SIZES and SCALE_FC_LAYER_SIZES.

Thank you!

dyelax commented 7 years ago

I have never tested this with outputting more than one frame at a time. The original paper added some extra tricks to make that work well. But the easiest way to tweak this to do what you want is in constants.py set HIST_LEN = 10 and in SCALE_FMS_G, set the last element of each array (currently all 3s) to be 9 (3 channels * 3 output frames). You will have to write something to parse the output, since it will be all 3 images stacked on top of one another.

This may also break some of the training loop that has to do with visualizing the images, since those functions are expecting 3-channel inputs.

Let me know how this works for you! And be sure to use TensorFlow 0.12 (doesn't work on the latest versions yet)

bitxsw93 commented 7 years ago

Thank you for your kindly reply.

The original paper gave the models of 8 input frames and 8 output frames. Can I just modify the SCALE_CONV_FMS, SCALE_KERNEL_SIZES and SCALE_FC_LAYER_SIZES. set HIST_LEN=8 to get the 8 prediction frames?

Best wishes!

dyelax commented 7 years ago

Yes, after re-reading that section of the paper, if you tweak those hyperparameters it should work.

Please let me know how it turns out!

bitxsw93 commented 7 years ago

I don't understand why you add a column to SCALE_CONV_FMS_D, SCALE_FMS_G, SCALE_FC_LAYER_SIZES_D, instead of just using the models of 4 inputs and 1 output provided by the original paper? when I tweak those hyperparameters for 8 inputs and 8 outputs, need I add it too?

dyelax commented 7 years ago

I don't understand your question. Could you point to the piece of code you are confused about, and give an example of what you think it should be?

bitxsw93 commented 7 years ago

In your code, SCALE_FMS_G = [[3 HIST_LEN, 128, 256, 128, 3], [3 (HIST_LEN + 1), 128, 256, 128, 3], [3 (HIST_LEN + 1), 128, 256, 512, 256, 128, 3], [3 (HIST_LEN + 1), 128, 256, 512, 256, 128, 3]] There are not the first and the last columns in the original paper. why do you add them? Can you explain it. Thank you

dyelax commented 7 years ago

The first column is the depth of the input (3 channels the number of input frames), and the last column is the depth of the output (3 channels 1 output frame). I set it up this way so it would be easy to change the number of input or output frames

bitxsw93 commented 7 years ago

I have some data which is 883 instead of your training data 32323. I just revise the TRAIN_HEIGHT & TRAIN_WIDTH to 8. but when running to preds = tf.nn.conv2d(last_input, conv_ws[i], [1, 1, 1, 1], padding=c.PADDING_D). there is an error:ValueError: Negative dimension size caused by subtracting 3 from 1 for 'discriminator/scale_net_0/calculation/convolutions/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,3], [3,3,3,64]. what else should I revise? Thank u

dyelax commented 7 years ago

I'm guessing this is because there are four scale networks that each downsample the image by 2x, so if your original images are 8 pixels wide, the image input to the smallest scale network will be 1 pixel, which could be too small to convolve over with 3x3 or 5x5 kernels