Closed bitxsw93 closed 7 years ago
I have never tested this with outputting more than one frame at a time. The original paper added some extra tricks to make that work well. But the easiest way to tweak this to do what you want is in constants.py
set HIST_LEN = 10
and in SCALE_FMS_G
, set the last element of each array (currently all 3s) to be 9 (3 channels * 3 output frames). You will have to write something to parse the output, since it will be all 3 images stacked on top of one another.
This may also break some of the training loop that has to do with visualizing the images, since those functions are expecting 3-channel inputs.
Let me know how this works for you! And be sure to use TensorFlow 0.12 (doesn't work on the latest versions yet)
Thank you for your kindly reply.
The original paper gave the models of 8 input frames and 8 output frames. Can I just modify the SCALE_CONV_FMS, SCALE_KERNEL_SIZES and SCALE_FC_LAYER_SIZES. set HIST_LEN=8 to get the 8 prediction frames?
Best wishes!
Yes, after re-reading that section of the paper, if you tweak those hyperparameters it should work.
Please let me know how it turns out!
I don't understand why you add a column to SCALE_CONV_FMS_D, SCALE_FMS_G, SCALE_FC_LAYER_SIZES_D, instead of just using the models of 4 inputs and 1 output provided by the original paper? when I tweak those hyperparameters for 8 inputs and 8 outputs, need I add it too?
I don't understand your question. Could you point to the piece of code you are confused about, and give an example of what you think it should be?
In your code, SCALE_FMS_G = [[3 HIST_LEN, 128, 256, 128, 3], [3 (HIST_LEN + 1), 128, 256, 128, 3], [3 (HIST_LEN + 1), 128, 256, 512, 256, 128, 3], [3 (HIST_LEN + 1), 128, 256, 512, 256, 128, 3]] There are not the first and the last columns in the original paper. why do you add them? Can you explain it. Thank you
The first column is the depth of the input (3 channels the number of input frames), and the last column is the depth of the output (3 channels 1 output frame). I set it up this way so it would be easy to change the number of input or output frames
I have some data which is 883 instead of your training data 32323. I just revise the TRAIN_HEIGHT & TRAIN_WIDTH to 8. but when running to preds = tf.nn.conv2d(last_input, conv_ws[i], [1, 1, 1, 1], padding=c.PADDING_D). there is an error:ValueError: Negative dimension size caused by subtracting 3 from 1 for 'discriminator/scale_net_0/calculation/convolutions/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,3], [3,3,3,64]. what else should I revise? Thank u
I'm guessing this is because there are four scale networks that each downsample the image by 2x, so if your original images are 8 pixels wide, the image input to the smallest scale network will be 1 pixel, which could be too small to convolve over with 3x3 or 5x5 kernels
Hello: If I want to change the number of the input frames and the output frames. for example, given 10 input frames and predict the next 3 frames. then, how to decide the matrix of SCALE_CONV_FMS, SCALE_KERNEL_SIZES and SCALE_FC_LAYER_SIZES.
Thank you!