Closed varunnalluru closed 5 months ago
We did not release the code for extracting captions. However, we provided the extracted captions in the data link. If you want to try other captioner, you just need to format the captions as ours, then you should be able to directly run this codebase.
actually you have provided the model link for pretrained visual captioner (LaViLa model ) but can you help me to load and run this model because
I am getting like these
Dict object is not callable
You can refer to the LaViLa codebase (https://github.com/facebookresearch/LaViLa) for how to use it. Our checkpoint follows the same structure as the LaViLa base (TSF-B + GPT-2) model. For inference, you can refer to the script that LaViLa's authors provided: https://colab.research.google.com/drive/1gHWiEWywIotRivYQTR-8NQ6GJC7sJUe4.
what are the values of the parameters that you have taken to get that accuracy for mine everything its giving the response as -1 and the accuracy is 0
Can you give me more details about your experiments? I setup this repo again and everything works fine on my device.
Does this code includes visual captioner ?