ChrisAllenMing / Cross_Category_Video_Highlight

Implementation of Cross-category Video Highlight Detection via Set-based Learning (ICCV 2021).
66 stars 6 forks source link

About the number of videos in given ActivityNet_vids.txt file #4

Open vatica opened 2 years ago

vatica commented 2 years ago

Yours paper says there are 2520 videos for training and 1260 videos for test, while there are only 3317 videos in ActivityNet_vids.txt in total. What is the reason for this? If I train model with these, will it affect the results?

ChrisAllenMing commented 2 years ago

Hi, thanks for your interests in our work! The listed 3317 ActivityNet videos contain both training and valid videos, the split label (train or val) of each video is contained in the annotation file. Note that, since some videos contain the highlight moments of more than one video categories, we use such a video as a sample for all the categories whose highlights are contained in the video, and thus the number of train&val videos (2520+1260=3780) is larger than the crawled 3317 videos.

vatica commented 2 years ago

Thanks for your explanation, but I have encountered a new problem. That is I followed your instructions to train the SL model on Youtube Highlights dataset without any modification, but the mAP of each category is 5-10% lower than that reported in the paper, what could be the cause of this?

ChrisAllenMing commented 2 years ago

On the YouTube Highlights dataset, we commonly find a smaller number of epochs (e.g. adding --epochs 10 or 20 to the running command) favors model performance. You can also split a holdout validation set to perform early stopping. Also, do you add the argument --use_transformer to the running command? This argument has to be added to enable the Transformer module, which significantly enhances model performance.