CeeZh / LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
MIT License
81 stars 4 forks source link

Visual Captioner #2

Closed varunnalluru closed 5 months ago

varunnalluru commented 7 months ago

Does this code includes visual captioner ?

CeeZh commented 7 months ago

We did not release the code for extracting captions. However, we provided the extracted captions in the data link. If you want to try other captioner, you just need to format the captions as ours, then you should be able to directly run this codebase.

varunnalluru commented 7 months ago

actually you have provided the model link for pretrained visual captioner (LaViLa model ) but can you help me to load and run this model because

Screenshot 2024-02-02 at 12 13 35 PM Screenshot 2024-02-02 at 12 14 31 PM

I am getting like these

varunnalluru commented 7 months ago

Dict object is not callable

CeeZh commented 7 months ago

You can refer to the LaViLa codebase (https://github.com/facebookresearch/LaViLa) for how to use it. Our checkpoint follows the same structure as the LaViLa base (TSF-B + GPT-2) model. For inference, you can refer to the script that LaViLa's authors provided: https://colab.research.google.com/drive/1gHWiEWywIotRivYQTR-8NQ6GJC7sJUe4.

varunnalluru commented 7 months ago

what are the values of the parameters that you have taken to get that accuracy for mine everything its giving the response as -1 and the accuracy is 0

CeeZh commented 6 months ago

Can you give me more details about your experiments? I setup this repo again and everything works fine on my device.