coxlab / prednet

Code and models accompanying "Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning"
https://arxiv.org/abs/1605.08104
MIT License
759 stars 259 forks source link

Prediction_results #14

Closed Bozorgtabar closed 7 years ago

Bozorgtabar commented 7 years ago

Hi, I used your code with only two stack layers on some sample videos. However, from the results it seems that the prediction comes with one frame delay. Does it mean that there is a bug in the code or I did not understand the network correctly? I used the below parameters:

Model parameters

nt = 10 n_channels, im_height, im_width = (3, 128, 160) input_shape = (n_channels, im_height, im_width) if K.image_dim_ordering() == 'th' else (im_height, im_width, n_channels) stack_sizes = (n_channels, 32) R_stack_sizes = stack_sizes A_filt_sizes = (3,) Ahat_filt_sizes = (3, 3) R_filt_sizes = (3, 3) layer_loss_weights = np.array([1., 0.]) layer_loss_weights = np.expand_dims(layer_loss_weights, 1) time_loss_weights = 1./ (nt - 1) * np.ones((nt,1)) time_loss_weights[0] = 0

Sample prediction results: plot_2

plot_68

Any suggestion. Thanks a lot.

bill-lotter commented 7 years ago

Hello, this is likely happening because you are using a model with too small of capacity (only two layers), so it's just copying the last seen frame for the most part. If you increase the number of layers, that should help (for the bottom sequence at least).