Closed tujun233 closed 1 year ago
Hi @tujun233 , yes, Activitynet videos are publicly released. Check this link for the official Activitynet dataset webpage. Activitynet Captions (dataset used in this repo) is built upon the Activitynet dataset by collecting language descriptions of the actions happening in the videos through mechanical turkers.
When training VLG-Net, we use pre-extracted C3D features as it is very challenging (if not nearly impossible) to fine-tune the whole video backbone due to hardware limitations.
Please let me know if I clarified all your doubts, and feel free to follow up with more questions were you in need of additional information.
Hello, thank you for your reply, I tried to change the torch version to 1.7 and it solved the original problem
I have a new question, how to get the keys (syntactic_dependencies and dependencies_matrices) in the json file. I want to test on other datasets.
Hi @huxiwen please check this other thread. I provide the code for computing the syntactic dependencies in each sentence. Let me know if you have any issues running it.
Best, Mattia
@Soldelli Thanks for explaining in detail, I have figured out the process of generating the json file.
Cheers
I wanna know if ActivityNet dataset has original video data? All methods take the feature by C3D as input.