Closed liming-ai closed 3 years ago
Hi, Thank you for your interest.
We use all the 1010 validation videos of 101 classes during the training process. However, we use only the subset of test videos of 20 classes for the evaluation. This training scheme has been adopted by some previous approaches (refer to W-TALC and 3C-Net). As shown by W-TALC (refer to table 1 of the paper), using the reduced set (using 200 videos of 20 classes instead of 1010 videos of 101 classes) for training actually performs a bit better despite the less number of videos. We also observed similar results, but I remembered that it required a whole different parameter set. You can try with the reduced set for training.
Got that, thanks for your contribution
hi @kylemin, when I use you code to train THUMOS14, I print the shape of wtcam, and I found its shape is (batch_size, num_segments, 101), I think it should be 20 instead of 101, since people using the validation set to train network, could you please check this?