doc-doc / HQGA

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
MIT License
31 stars 4 forks source link

How to extract object feature by BUTD? #17

Open dongfengxijian opened 1 year ago

dongfengxijian commented 1 year ago

Thank you for your excellent work! Should I extract image from raw video before I extract feature by BUTD? Should I save the pre-proposed bbox and output feature in the same file(e.t. *.h5) ? Which mode is better? 'caffe' or 'd2'?