MikeWangWZHL / VidIL

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
MIT License
112 stars 1 forks source link

Custom dataset preparation #5

Closed nikky4D closed 2 years ago

nikky4D commented 2 years ago

How do I prepare this for finetuning on my own dataset? I would like to get the BLIP baseline on my custom dataset then finetune with your models on my dataset. Do I only need a folder of videos, with their captions in a json? Can you link me to a sample video/json for the dataset?

MikeWangWZHL commented 2 years ago

yes that would be sufficient, you can check out the dataset example in the dataset section here; for running the BLIP baseline, for example video captioning, you can check out train_caption_video.py and example usage in train_caption_video.sh