Closed Bozorgtabar closed 7 years ago
Hello, this is likely happening because you are using a model with too small of capacity (only two layers), so it's just copying the last seen frame for the most part. If you increase the number of layers, that should help (for the bottom sequence at least).
Hi, I used your code with only two stack layers on some sample videos. However, from the results it seems that the prediction comes with one frame delay. Does it mean that there is a bug in the code or I did not understand the network correctly? I used the below parameters:
Model parameters
nt = 10 n_channels, im_height, im_width = (3, 128, 160) input_shape = (n_channels, im_height, im_width) if K.image_dim_ordering() == 'th' else (im_height, im_width, n_channels) stack_sizes = (n_channels, 32) R_stack_sizes = stack_sizes A_filt_sizes = (3,) Ahat_filt_sizes = (3, 3) R_filt_sizes = (3, 3) layer_loss_weights = np.array([1., 0.]) layer_loss_weights = np.expand_dims(layer_loss_weights, 1) time_loss_weights = 1./ (nt - 1) * np.ones((nt,1)) time_loss_weights[0] = 0
Sample prediction results:
Any suggestion. Thanks a lot.