doc-doc / HQGA

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
MIT License
30 stars 4 forks source link

How to conduct dense and sparse sampling ? #1

Closed Fly2flies closed 2 years ago

Fly2flies commented 2 years ago

Hi, thanks you for sharing such a great work. I would like to know how to make dense sampling and sparse sampling after uniformly sampling K clip frames.

After sampling K clip frames c1,...,cK

Which of the above methods is corresponding to the paper ?

doc-doc commented 2 years ago

Thanks for your interest. We adopt the 1st method and sample 16 frames (8 forward / backward) centered at the key frame.

Fly2flies commented 2 years ago

Thanks for your interest. We adopt the 1st method and sample 16 frames (8 forward / backward) centered at the key frame.

Thank you for your reply. I would also like to know how much memory is needed to store all the TGIF-QA data and how to compress and store the extracted features ?

doc-doc commented 2 years ago

The raw TGIF_full dataset need about 124G. The features are store in .h5 files and need about 40G for each sub-task.