Soldelli / VLG-Net

VLG-Net: Video-Language Graph Matching Networks for Video Grounding
MIT License
30 stars 1 forks source link

C3D features #3

Closed fmu2 closed 2 years ago

fmu2 commented 2 years ago

Thank you for releasing the code! Could you please point me to the pre-trained C3D model you used for visual feature extraction? What are the hyper-parameters (frame rate, number of frames per clip, stride, etc.) for video pre-processing? Looking forward to your reply!

Soldelli commented 2 years ago

Dear @fmu2 thank you for reaching out to me. Let me give you more details about the features.

Please, feel free to reach out for any doubt or concern.

Best, Mattia

h-somehow commented 2 years ago

Dear @Soldelli Thanks for your excellent work!

I find that ['syntactic_dependencies'] is added into annotations(e.g. train.json) comparing with datasets I have downloaded before. I guess it's generated by Stanford-corenlp. Could you please point out or share the script that you used to generate ['syntactic_dependencies'].

Thanks for your excellent work once more!!!

Soldelli commented 2 years ago

Hi, @fmu2 I'll look for it and post it here if I find it. However, I just followed the official documentation. It should be easy to reproduce.

Soldelli commented 2 years ago

Here we go. Get the standford 4.0.0 library version from here. You will also need to install the pip library: pip install stanfordcorenlp Then use the notebook attached to this message. (The file is zipped to be shared over this message) Hopefully it is working properly (haven't touched it in over 2 years), let me know if it is not the case. Stanford Syntactic Parsers.zip

Cheers

h-somehow commented 2 years ago

@Soldelli Thanks for your detailed explanation. I've figured out the process.

Cheers

Lonicer commented 1 year ago

Hi, @fmu2 I'll look for it and post it here if I find it. However, I just followed the official documentation. It should be easy to reproduce.

Hello, can I only use your annotation files on charades and ANetv1.3 to train on the original data set? If possible, could you please provide a separate JSON file, and also, did you process these three datasets additionally? Thank you so much~

Soldelli commented 1 year ago

Dear @Lonicer, ActivityNet-Captions is based on ActivityNet1.3 videos. Therefore it is already supported. Regarding Chardes-STA, we purposely did not support it as we believe it should not be considered a valid benchmark for this task. We articulate some of the reasons and provide a new dataset in our CVPR22 work (paper, GitHub). Feel free to reach out if you have more questions.