Open shubhamagarwal92 opened 3 years ago
@shubhamagarwal92, thanks for your attention! If the duration of the action is short and the video length is very long, you can refer the THUMOS data processing. If the range of action duration is wide and the action duration is very long, ActivityNet data processing is more suitable.
Thanks @linchuming!
Hi, Congratulations on such a nice work! Also, thank you for open-sourcing the code! We are trying to use this code on our raw untrimmed videos and want to use this framework for temporal action localization.
We have our own non-standard data with 15 minutes of videos on avg at 30fps and a higher resolution (~500X900). We also have multiple actions in the videos.
For the activity net, I see that the max frames are specified to be 768
Could you please suggest if we need to split video into clips and what would be the length of each clip? Do we need to sample 256/768 frames uniformly? Or should we split clips based on the actions? Could you please point to any starter code that we could refer?
Thanks.