ecoxial2007 / LGVA_VideoQA

Language-Guided Visual Aggregation for Video Question Answering
4 stars 2 forks source link

Incomplete extract_embedding.py code? #7

Open niu-mc opened 9 months ago

niu-mc commented 9 months ago

Where is the bbox handling in clip?

ecoxial2007 commented 9 months ago

Thank you for your interest in our work. I recommend using the features we have pre-extracted, as using glip to extract bbox, cropping images with bbox, and then extracting features of bbox with clip can be very time-consuming.

If you wish to train on your own dataset, you can implement bbox cropping (using either OpenCV or Pillow) -> clip feature extraction yourself.