dingfengshi / TriDet

[CVPR2023] Code for the paper, TriDet: Temporal Action Detection with Relative Boundary Modeling
MIT License
160 stars 13 forks source link

Suggestion for long videos training #24

Closed lllllllll-3154 closed 10 months ago

lllllllll-3154 commented 10 months ago

Do you have any parameters suggestion for those long videos whose segments have a long range and the video is also long? Thanks

dingfengshi commented 10 months ago

For this type of video, I guess you can try to extract a snippet feature in a low fps and large window size, or you can try to rescale the temporal feature to a fixed length like the setting of ActivityNet.

lllllllll-3154 commented 10 months ago

Thanks so much for you answering. Does it mean the current configs and frameworks are more suitable for short segments with short videos, it is better to do some preprocess on the dataset?

dingfengshi commented 10 months ago

Thanks so much for you answering. Does it mean the current configs and frameworks are more suitable for short segments with short videos, it is better to do some preprocess on the dataset?

Not really. Due to the diversity of videos, different optimization techniques are used for different types of video datasets. For instance, THUMOS14 contains very long videos (>30 min) with a lot of action instances and is typically implemented with a small window. On the other hand, ActivityNet has many videos with a lot of single long action instances and is often used in a rescaled way to improve performance. This would probably be a better initial configuration for the scenario you are referring to.