Soldelli / VLG-Net

VLG-Net: Video-Language Graph Matching Networks for Video Grounding
MIT License
30 stars 1 forks source link

About the datasets #5

Closed tujun233 closed 1 year ago

tujun233 commented 2 years ago

I wanna know if ActivityNet dataset has original video data? All methods take the feature by C3D as input.

Soldelli commented 2 years ago

Hi @tujun233 , yes, Activitynet videos are publicly released. Check this link for the official Activitynet dataset webpage. Activitynet Captions (dataset used in this repo) is built upon the Activitynet dataset by collecting language descriptions of the actions happening in the videos through mechanical turkers.

When training VLG-Net, we use pre-extracted C3D features as it is very challenging (if not nearly impossible) to fine-tune the whole video backbone due to hardware limitations.

Please let me know if I clarified all your doubts, and feel free to follow up with more questions were you in need of additional information.

SivanHu commented 2 years ago

Hello, thank you for your reply, I tried to change the torch version to 1.7 and it solved the original problem

I have a new question, how to get the keys (syntactic_dependencies and dependencies_matrices) in the json file. I want to test on other datasets.

Soldelli commented 2 years ago

Hi @huxiwen please check this other thread. I provide the code for computing the syntactic dependencies in each sentence. Let me know if you have any issues running it.

Best, Mattia

SivanHu commented 2 years ago

@Soldelli Thanks for explaining in detail, I have figured out the process of generating the json file.

Cheers