Closed nikky4D closed 2 years ago
yes that would be sufficient, you can check out the dataset example in the dataset section here; for running the BLIP baseline, for example video captioning, you can check out train_caption_video.py and example usage in train_caption_video.sh
How do I prepare this for finetuning on my own dataset? I would like to get the BLIP baseline on my custom dataset then finetune with your models on my dataset. Do I only need a folder of videos, with their captions in a json? Can you link me to a sample video/json for the dataset?