Closed AnaRhisT94 closed 2 years ago
After testing the code of run_video_CapFilt.py
I got the following results:
Blip_cap 21.46, 0.476, 0.222, 0.289
I'm trying to find the code of BLIP
and see how to reproduce the results for Cider of 39.5~ and BLEU-4 of 27.7~
The code in run_video_CapFilt.py is for generating frame captions, not for this video captioning baseline; You can find the code for training the few-shot BLIP baseline for video-captioning at https://github.com/MikeWangWZHL/VidIL/blob/main/train_caption_video.py; There is also a example train_caption_video.sh
file in scripts/
. It is modified from the original BLIP video eval code here;
As mentioned in our paper, instead of only do evaluation, we further train BLIP using this contatednated features from few-shot traning samples.
Hi, According to the reported baselines of BLIP and BLIP_cap:
I'm trying to understand, and also find in the code how you computed this baseline on the 4 metrics. According to the paper, Section 4.2, you wrote that you stitch multiple frames and compute the loss. But I'm not sure I understand how it's done (and where is it implemented in the code).
Any help is highly appreciated!
Thanks a lot.