Hi, I used your code with only two stack layers on some sample videos. However, from the results it seems that the prediction comes with one frame delay. Does it mean that there is a bug in the code or I did not understand the network correctly? I used the below parameters:

Model parameters

nt = 10 n_channels, im_height, im_width = (3, 128, 160) input_shape = (n_channels, im_height, im_width) if K.image_dim_ordering() == 'th' else (im_height, im_width, n_channels) stack_sizes = (n_channels, 32) R_stack_sizes = stack_sizes A_filt_sizes = (3,) Ahat_filt_sizes = (3, 3) R_filt_sizes = (3, 3) layer_loss_weights = np.array([1., 0.]) layer_loss_weights = np.expand_dims(layer_loss_weights, 1) time_loss_weights = 1./ (nt - 1) * np.ones((nt,1)) time_loss_weights[0] = 0

Sample prediction results: plot_2

plot_68

Any suggestion. Thanks a lot.

coxlab / prednet

Prediction_results #14

Model parameters