Closed fmu2 closed 2 years ago
Dear @fmu2 thank you for reaching out to me. Let me give you more details about the features.
Please, feel free to reach out for any doubt or concern.
Best, Mattia
Dear @Soldelli Thanks for your excellent work!
I find that ['syntactic_dependencies'] is added into annotations(e.g. train.json) comparing with datasets I have downloaded before. I guess it's generated by Stanford-corenlp. Could you please point out or share the script that you used to generate ['syntactic_dependencies'].
Thanks for your excellent work once more!!!
Hi, @fmu2 I'll look for it and post it here if I find it. However, I just followed the official documentation. It should be easy to reproduce.
Here we go.
Get the standford 4.0.0 library version from here.
You will also need to install the pip library: pip install stanfordcorenlp
Then use the notebook attached to this message. (The file is zipped to be shared over this message)
Hopefully it is working properly (haven't touched it in over 2 years), let me know if it is not the case.
Stanford Syntactic Parsers.zip
Cheers
@Soldelli Thanks for your detailed explanation. I've figured out the process.
Cheers
Hi, @fmu2 I'll look for it and post it here if I find it. However, I just followed the official documentation. It should be easy to reproduce.
Hello, can I only use your annotation files on charades and ANetv1.3 to train on the original data set? If possible, could you please provide a separate JSON file, and also, did you process these three datasets additionally? Thank you so much~
Dear @Lonicer, ActivityNet-Captions is based on ActivityNet1.3 videos. Therefore it is already supported. Regarding Chardes-STA, we purposely did not support it as we believe it should not be considered a valid benchmark for this task. We articulate some of the reasons and provide a new dataset in our CVPR22 work (paper, GitHub). Feel free to reach out if you have more questions.
Thank you for releasing the code! Could you please point me to the pre-trained C3D model you used for visual feature extraction? What are the hyper-parameters (frame rate, number of frames per clip, stride, etc.) for video pre-processing? Looking forward to your reply!