TencentYoutuResearch / ActionDetection-AFSD

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"
Other
171 stars 31 forks source link

Data Pre-processing for untrimmed videos on non-standard data #22

Open shubhamagarwal92 opened 3 years ago

shubhamagarwal92 commented 3 years ago

Hi, Congratulations on such a nice work! Also, thank you for open-sourcing the code! We are trying to use this code on our raw untrimmed videos and want to use this framework for temporal action localization.

We have our own non-standard data with 15 minutes of videos on avg at 30fps and a higher resolution (~500X900). We also have multiple actions in the videos.

For the activity net, I see that the max frames are specified to be 768

Could you please suggest if we need to split video into clips and what would be the length of each clip? Do we need to sample 256/768 frames uniformly? Or should we split clips based on the actions? Could you please point to any starter code that we could refer?

Thanks.

linchuming commented 3 years ago

@shubhamagarwal92, thanks for your attention! If the duration of the action is short and the video length is very long, you can refer the THUMOS data processing. If the range of action duration is wide and the action duration is very long, ActivityNet data processing is more suitable.

shubhamagarwal92 commented 3 years ago

Thanks @linchuming!