Thanks for your exellent contribution. It seems that the training samples in an epoch (as your defined 53200) is far less than the annotated object in the whole VID dataset. So in an epoch, the code only utilise a small amount of data? Thanks.
Yes, 53200 is the same number in the original matconvnet version. In every epoch, we only guarantee every video will be sampled, details can be found in dataset.py.
Thanks for your exellent contribution. It seems that the training samples in an epoch (as your defined 53200) is far less than the annotated object in the whole VID dataset. So in an epoch, the code only utilise a small amount of data? Thanks.